bpg / terraform-provider-proxmox

Terraform Provider for Proxmox
https://registry.terraform.io/providers/bpg/proxmox
Mozilla Public License 2.0
809 stars 134 forks source link

Adding second NIC with ip_config hangs cloud_init / terraform ( tofu ) #1592

Open VGerris opened 2 days ago

VGerris commented 2 days ago

Describe the bug When two networks are configured and the second has for example dhcp set, terraform doesn´t finish

To Reproduce Steps to reproduce the behavior:

  1. Create a terraform file with snippet like:
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
it works
When at first run also :
initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

    user_data_file_id = [proxmox_virtual_environment_file.cloud_config.id](http://proxmox_virtual_environment_file.cloud_config.id/)
  }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }
  1. Run tofu apply
  2. VM gets created with 2 NIC
  3. Run tofu destroy Add second snippet for 2nd interface:
    initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }
  4. Run tofu apply
  5. See error - it hangs

Please also provide a minimal Terraform configuration that reproduces the issue.


initialization {
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    # next part is added and applies to second NIC
    }
    ip_config {
      ipv4 {
        address = "dhcp"
      }
    }

  network_device {
    bridge = "vmbr1"
  }

  network_device {
    bridge = "vmbr1"
    vlan_id = "56"
  }

and the output of terraform|tofu apply.

VM creating ....

Expected behavior I would expect tofu to continue, even though the IP may not be fetched.

Additional context Add any other context about the problem here.

This may be related to cloud-init: https://forum.proxmox.com/threads/assign-multiple-ip-to-vm-using-cloud-init.116259/

And the other provider may have something similar and a solution: https://github.com/Telmate/terraform-provider-proxmox/issues/1015

Idealy the IP gets given but when this is not possible because of how cloud-init works, it just continuing and showing the issue seems like a good solution

VGerris commented 2 days ago

Some additional findings.

It seems the network is configured by netplan and cloud-init puts the configuration in :

/etc/netplan/50-cloud-init.yaml

The snippet : ip_config { ipv4 { address = "dhcp" } }

results in something like: network: version: 2 ethernets: eth0: match: macaddress: "bc:24:11:c8:de:82" dhcp4: true

When the second snippet is added in the main.tf file as described above, the cloud-init file gets the correct info in it for the second network and everything works fine after boot.

So the problem only occurs at first creation, not when adding it later. That leads me to believe something in the code is not prepared to handle multiple network configs. If I set debugging on, one of the last calls regarding networking seems:

https://github.com/bpg/terraform-provider-proxmox/blob/main/proxmoxtf/provider/provider.go#L251

I'm suspecting it may be there where the issue starts. I am not familiar with Go, so I will see how far I get.

Did anyone else see and have this or better yet, can someone with Go knowledge see if the issue may start there ?

I could start by looking at what the nodeAddress is, can someone point to instructions on how to deploy the provider with updated code ? Thank you

VGerris commented 1 day ago

I have been investigating further. Since the creation with terraform never finished and the settings I put in cloud-init did not give me access to the VM, I modified an image to have a root account so I can login at creation time. Investigating the machine learned that the netplan config looked good, but somehow the network is not set properly to reach the internet, even though both NICs get a DHCP address ( from different servers ).

The terraform process is actually waiting for qemu-tools to reply. When I fix the network by using dhcpcd and install the package and start it, terraform continues and all looks good and as expected.

This seems to indicate that cloud-init somehow is not able to get routing proper when using 2 NICS but also that if that can be fixed in the cloud init script, it may be solvable. The best solution would be to be able to find why cloud-init has an issue completing properly and perhaps even fix it there but as linked above, some people say that it is not supposed to provide access for automation. I tend to disagree because my automation may be run from another net and the VM needs the internet ( which is what I have now and why I encountered this behavior ).

So far I have tried netplan apply and to add ipv6 = false without consistent success. It would be great if anyone can help finding the network cause of this, then a possible workaround would be include the proper commands in the cloud-init script.

Another workaround I used before is to get the 2nd interface from terraform and then run Ansible to run dhcpcd on the interface, but that doesn´t 'stick' either. In that case I get the NIC like this :

output "vm_nic_2_name" {
  value = proxmox_virtual_environment_vm.ubuntu_vm.network_interface_names[2]
}

ad then in script that runs Ansible : sed "s/nic1_replace/$(tofu -chdir=$BASEDIR/terraform-proxmox output vm_nic_2_name | sed -nr 's|.*"(.*)".*|\1|p')/g" inventory_template1.yml > inventory.yml

which sets an Ansbile var that is used like :

    - name: Run dhcpcd on second NIC
      ansible.builtin.command: dhcpcd {{ nic_1 }}
      register: nic

I am gonna look a bit further into the best way to have the network configured properly and post, in the mean while, help and tips are appreciated :)

VGerris commented 1 day ago

Based on info on netplan and some reading I found an acceptable work around.

In the cloud-config snippet, write to a file with netplan config:

    write_files:
      - path: /etc/netplan/99-network-config.yaml
        permissions: "0600"
        owner: root
        content: |
          network:
            version: 2
            ethernets:
              ens19:
                dhcp4: true
                match:
                  name: "ens19"
                mtu: 1500
                set-name: "eth1"

Then at the top of runcmd add:

    runcmd:
        - netplan apply
        - .....

Creation takes a bit longer and for some reason the apt update command too, but this configures both interfaces the same as with the double snippet, but with working internet and thus qemu-tools.

Perhaps good to add this to docs. That's the best I can do for now, without spending tons of more time that is scarce currently :).

This relies on the name of the interface, I am not aware of a way to get the mac or name before so it can used dynamically, but it's good enough for me.

Any improvements are welcome and I can make a PR for the docs if that's appreciated. Thank you all for maintaining this terraform provider, it is pretty awesome!

VGerris commented 23 hours ago

turns out there is something more needed because a route is added by default. there is an option to skip that:

                dhcp4-overrides:
                  use-routes: false

Now when I use a snippet like:

    ip_config {
      ipv4 {
        address = "192.168.56.20/24"
        gateway = "192.168.56.1"
      }
    }

I also get a route set as default and as a consequence the same problem as with 2 dhcp snippets. In this case the workaround is a bit simpler, to remove that route before anything: runcmd:

If the use-routes: false option can be made part of the resource: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_vm#ip_config it may well be a solution for this behavior, by simply setting that option in the ip_config snippet.

As the documentation says, and probably better is to omit the gateway, then it does not add a route and everything works as expected.