bpg / terraform-provider-proxmox

Terraform Provider for Proxmox
https://registry.terraform.io/providers/bpg/proxmox
Mozilla Public License 2.0
782 stars 127 forks source link

Static IP not populated in the tfstate after all provisioner executed #310

Closed joace closed 1 year ago

joace commented 1 year ago

Describe the bug Used provider 1.8.0 on proxmox to clone some new VM from an existing MicroOS template, as unknown issue in MicroOS, can't use cloud-init to configure new VMs, so used "file" and "remote-exec" provisioner to copy a network manager profile activate connection, (specifically assign a static IP on 2nd NIC - eth1, 1st NIC -eth0 still working with DHCP), but after all configuration successfully completed, the assigned IP not populated in generated tfstate file even the target host already had NIC connected, had to run "terraform apply -refresh-only" to get it updated.

Actually, I have another local_file resource to generate ansible host inventory file based on provisioned target, since "ipv4_addresses" list doesn't have valid IP for eth1, it's empty, so the resource execution would fail.

Is this the expected behavior with this provider? If so, is there better way refresh state inside the configuration, just before "local_file" resource?

To Reproduce Steps to reproduce the behavior:

  1. Create a resource 'proxmox_virtual_environment_vm' with below provisioners:

    provisioner "file" {
    content = <<-EOT
      [connection]
      id=eth1
      interface-name=eth1
      type=ethernet
      autoconnection=true
    
      [ethernet]
    
      [ipv4]
      method=manual
      address1=${cidrhost(local.cluster_subnet, ((count.index+1) <= local.cluster_deployed_server_node ? (count.index+1) : (count.index+1-local.cluster_deployed_server_node+var.cluster_max_server_node)))}/${local.private_network_masknum}
    #address1=192.168.1.65/27
    
      [ipv6]
      method=disabled
    EOT
    destination = "/etc/NetworkManager/system-connections/eth1.nmconnection"
    }
    provisioner "remote-exec" {
    inline = [
      "chmod 0600 /etc/NetworkManager/system-connections/eth1.nmconnection",
      "nmcli connection reload",
      "nmcli connection up eth1",
      "<Other cmd removed>"
    ]
    }
  2. The new VM had the correct IPs assigned to both NIC image

  3. The output of VM attribute ipv4_addresses missing the IP on 2nd NIC

    "outputs": {
    "node_ipv4_addresses": {
      "value": [
        [
          [
            "127.0.0.1"
          ],
          [
            "10.14.76.58"
          ],
          []
        ],
        [
          [
            "127.0.0.1"
          ],
          [
            "10.14.76.59"
          ],
          []
        ]
      ],
      "type": [
        "tuple",
        [
          [
            "list",
            [
              "list",
              "string"
            ]
          ],
          [
            "list",
            [
              "list",
              "string"
            ]
          ]
        ]
      ]
    }
    }
  4. If I manually run terraform apply -refresh-only, the output will be updated with all IPs included:

    
    Outputs:

node_ipv4_addresses = [ tolist([ tolist([ "127.0.0.1", ]), tolist([ "10.14.76.58", ]), tolist([ "192.168.1.65", ]), ]), tolist([ tolist([ "127.0.0.1", ]), tolist([ "10.14.76.59", ]), tolist([ "192.168.1.70", ]), ]), ]

Please also provide a minimal Terraform configuration that reproduces the issue.

**Expected behavior**
Will provider sync up tfstate with final actual state on remote host? 
BTW, I used the same provider to build a ubuntu VM, which had cloud-init working correctly, used below initialization block to assign IPs, it worked.

initialization { datastore_id = var.disk_storage dns { domain = var.dns_domain } ip_config { ipv4 { address = "dhcp" } } ip_config { ipv4 { address = "${cidrhost(local.cluster_subnet, ((count.index+1) <= local.cluster_deployed_server_node ? (count.index+1) : (count.index+1-local.cluster_deployed_server_node+var.cluster_max_server_node)))}/${local.private_network_masknum}" } } user_account { keys = [file(var.ssh_public_key_file)] username = var.normal_user_name password = var.normal_user_password } }


**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
- Terraform and provider version:

Terraform v1.4.5 on linux_amd64

bpg commented 1 year ago

Hey @joace! 👋🏼 The provider does not have any insights in what the remote-exec provisioner is doing, it is executed at the end of the creation phase when the resource (a VM) is created as specified by the template. The resource state is captured at that moment as well, before executing external provisioners.

I think it would be possible to do a similar config via cloud-init, without remote-exec. For example, one of my test box is configured like this:

data "local_file" "ssh_public_key" {
  filename = "./id_rsa.pub"
}

resource "proxmox_virtual_environment_file" "cloud_config" {
  content_type = "snippets"
  datastore_id = "local"
  node_name    = var.virtual_environment_node_name

  source_raw {
    data = <<EOF
#cloud-config
users:
  - default
  - name: ubuntu
    groups:
      - sudo
    shell: /bin/bash
    ssh_authorized_keys:
      - ${trimspace(data.local_file.ssh_public_key.content)}
    sudo: ALL=(ALL) NOPASSWD:ALL
runcmd:
    - apt update
    - apt install -y qemu-guest-agent net-tools mc curl apt-transport-https vim git wget gnupg2 software-properties-common lsb-release ca-certificates uidmap
    - timedatectl set-timezone America/Toronto
    - systemctl enable qemu-guest-agent
    - systemctl start qemu-guest-agent
    - swapoff -a
    - ........ keep going ....
    EOF

    file_name = "cloud-config.yaml"
  }
}

resource "proxmox_virtual_environment_vm" "node" {
  vm_id     = 9300
  name      = "template-ubuntu-23.04-k8s-1.27.1"
  node_name = var.virtual_environment_node_name
  description = "Ubuntu 23.04 with k8s 1.27.1"

  agent {
    enabled = true
  }

  cpu {
    cores = 2
    type = "host"
  }

  memory {
    dedicated = 8192
  }

  disk {
    datastore_id = "fast"
    interface    = "scsi0"
    file_id      = proxmox_virtual_environment_file.ubuntu_cloud_image.id
    size         = 32
  }

  initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    // note: cloud-init config is required to enable user agent
    user_data_file_id = proxmox_virtual_environment_file.cloud_config.id
  }

  network_device {
    bridge = "vmbr2"
  }

}

resource "proxmox_virtual_environment_file" "ubuntu_cloud_image" {
  content_type = "iso"
  datastore_id = "local"
  node_name    = var.virtual_environment_node_name

  source_file {
    path = "http://cloud-images.ubuntu.com/lunar/current/lunar-server-cloudimg-amd64.img"
  }
}

You can also have cloud-init config in an external file, as in this example.

I have no extra NICs in those, but you can check with https://cloudinit.readthedocs.io/en/latest/reference/network-config-format-v2.html how to configure some.

joace commented 1 year ago

@bpg thank you for the clarification! After reading more from google search, guess this is some symptom they call "configuration drift" which caused by changes outside the provider. As I mentioned, there may be some other issue when use MicroOS + PVE, the cloud-init initiation block didn't work by just specify ip_config section, so trring the way you suggested to pass some snippets as cloud-init config, as below:

initialization {
    datastore_id = var.disk_storage
    network_data_file_id = proxmox_virtual_environment_file.ci_network_data[count.index].id
    user_data_file_id = proxmox_virtual_environment_file.ci_user_data.id
}

but new problem jumped out, even with previously working provider / variable and below single resource in main.tf, the file resource couldn't connect to my node, got below error, seems it tried to connect node's corosync network, not the endpoint defined in provider: Error: failed to dial 10.0.10.2:22: dial tcp 10.0.10.2:22: connect: connection timed out

//username / password read from ENV - PROXMOX_VE_USERNAME / PROXMOX_VE_PASSWORD
provider "proxmox" {
  endpoint = "https://10.14.73.216:8006"
  insecure = true
}

main.tf

resource "proxmox_virtual_environment_file" "ci_network_data" {
  count = local.cluster_deploayed_total_node

  content_type = "snippets"
  datastore_id = "cephfs-ssd"
  node_name    = "adtpven2"

  source_raw {
    data = <<-EOT
      network:
        config: enabled
        version: 2
        renderer: NetworkManager
        ethernet:
          eth0:
            dhcp4: yes
            dhcp6: no
            dhcp-identifier: mac
          eth1:
            dhcp4: no
            dhcp6: no
            addresses:
              - ${cidrhost(local.cluster_subnet, ((count.index+1) <= local.cluster_deployed_server_node ? (count.index+1) : (count.index+1-local.cluster_deployed_server_node+var.cluster_max_server_node)))}/${local.private_network_masknum}
    EOT
    file_name = "${var.cluster_prefix}-c${var.cluster_subnet_index}-${(count.index+1) <= local.cluster_deployed_server_node ? "sn" : "an"}${(count.index+1) <= local.cluster_deployed_server_node ? (count.index+1) : (count.index+1)-local.cluster_deployed_server_node}-ci-network-data.yaml"
  }
}

Did I miss something in configuration if need to use file resource?

bpg commented 1 year ago

... seems it tried to connect node's corosync network, not the endpoint defined in provider

I think you have some specifics in the PVE node network configuration that are similar to what was discussed in #302. The provider doesn't have much flexibility in selecting which interface to use for SSH connection to the node, and it assumes the first one (see more details in this comment.

joace commented 1 year ago

You mean "proxmox_virtual_environment_file" resource still depends on host ssh connection to achieve the fucntionality? I thought with a given PVE endpoint in provider definition, all implemented with API calls, I didn't see this issue if only use "proxmox_virtual_environment_vm" resource, that's why it's a little odd to me. But anyway, my issue described in OP may be caused by cloud-init not working with MicroOS + PVE, i had it worked around by generate an intermediate local value to record desired private IP, after activating it in remote-exec provisioner finished in "terraform apply" task, run another "terraform apply -refresh-only" ansible task to sync up the states, it's not perfect use case, but solved my problem.

Thanks again for the explanation and this great provider!