Open VGerris opened 1 month ago
Some additional findings.
It seems the network is configured by netplan and cloud-init puts the configuration in :
/etc/netplan/50-cloud-init.yaml
The snippet : ip_config { ipv4 { address = "dhcp" } }
results in something like: network: version: 2 ethernets: eth0: match: macaddress: "bc:24:11:c8:de:82" dhcp4: true
When the second snippet is added in the main.tf file as described above, the cloud-init file gets the correct info in it for the second network and everything works fine after boot.
So the problem only occurs at first creation, not when adding it later. That leads me to believe something in the code is not prepared to handle multiple network configs. If I set debugging on, one of the last calls regarding networking seems:
https://github.com/bpg/terraform-provider-proxmox/blob/main/proxmoxtf/provider/provider.go#L251
I'm suspecting it may be there where the issue starts. I am not familiar with Go, so I will see how far I get.
Did anyone else see and have this or better yet, can someone with Go knowledge see if the issue may start there ?
I could start by looking at what the nodeAddress is, can someone point to instructions on how to deploy the provider with updated code ? Thank you
I have been investigating further. Since the creation with terraform never finished and the settings I put in cloud-init did not give me access to the VM, I modified an image to have a root account so I can login at creation time. Investigating the machine learned that the netplan config looked good, but somehow the network is not set properly to reach the internet, even though both NICs get a DHCP address ( from different servers ).
The terraform process is actually waiting for qemu-tools to reply. When I fix the network by using dhcpcd and install the package and start it, terraform continues and all looks good and as expected.
This seems to indicate that cloud-init somehow is not able to get routing proper when using 2 NICS but also that if that can be fixed in the cloud init script, it may be solvable. The best solution would be to be able to find why cloud-init has an issue completing properly and perhaps even fix it there but as linked above, some people say that it is not supposed to provide access for automation. I tend to disagree because my automation may be run from another net and the VM needs the internet ( which is what I have now and why I encountered this behavior ).
So far I have tried netplan apply and to add ipv6 = false without consistent success. It would be great if anyone can help finding the network cause of this, then a possible workaround would be include the proper commands in the cloud-init script.
Another workaround I used before is to get the 2nd interface from terraform and then run Ansible to run dhcpcd on the interface, but that doesn´t 'stick' either. In that case I get the NIC like this :
output "vm_nic_2_name" {
value = proxmox_virtual_environment_vm.ubuntu_vm.network_interface_names[2]
}
ad then in script that runs Ansible :
sed "s/nic1_replace/$(tofu -chdir=$BASEDIR/terraform-proxmox output vm_nic_2_name | sed -nr 's|.*"(.*)".*|\1|p')/g" inventory_template1.yml > inventory.yml
which sets an Ansbile var that is used like :
- name: Run dhcpcd on second NIC
ansible.builtin.command: dhcpcd {{ nic_1 }}
register: nic
I am gonna look a bit further into the best way to have the network configured properly and post, in the mean while, help and tips are appreciated :)
Based on info on netplan and some reading I found an acceptable work around.
In the cloud-config snippet, write to a file with netplan config:
write_files:
- path: /etc/netplan/99-network-config.yaml
permissions: "0600"
owner: root
content: |
network:
version: 2
ethernets:
ens19:
dhcp4: true
match:
name: "ens19"
mtu: 1500
set-name: "eth1"
Then at the top of runcmd add:
runcmd:
- netplan apply
- .....
Creation takes a bit longer and for some reason the apt update command too, but this configures both interfaces the same as with the double snippet, but with working internet and thus qemu-tools.
Perhaps good to add this to docs. That's the best I can do for now, without spending tons of more time that is scarce currently :).
This relies on the name of the interface, I am not aware of a way to get the mac or name before so it can used dynamically, but it's good enough for me.
Any improvements are welcome and I can make a PR for the docs if that's appreciated. Thank you all for maintaining this terraform provider, it is pretty awesome!
turns out there is something more needed because a route is added by default. there is an option to skip that:
dhcp4-overrides:
use-routes: false
Now when I use a snippet like:
ip_config {
ipv4 {
address = "192.168.56.20/24"
gateway = "192.168.56.1"
}
}
I also get a route set as default and as a consequence the same problem as with 2 dhcp snippets. In this case the workaround is a bit simpler, to remove that route before anything: runcmd:
If the use-routes: false option can be made part of the resource: https://registry.terraform.io/providers/bpg/proxmox/latest/docs/resources/virtual_environment_vm#ip_config it may well be a solution for this behavior, by simply setting that option in the ip_config snippet.
As the documentation says, and probably better is to omit the gateway, then it does not add a route and everything works as expected.
Describe the bug When two networks are configured and the second has for example dhcp set, terraform doesn´t finish
To Reproduce Steps to reproduce the behavior:
Please also provide a minimal Terraform configuration that reproduces the issue.
and the output of
terraform|tofu apply
.Expected behavior I would expect tofu to continue, even though the IP may not be fetched.
Additional context Add any other context about the problem here.
This may be related to cloud-init: https://forum.proxmox.com/threads/assign-multiple-ip-to-vm-using-cloud-init.116259/
And the other provider may have something similar and a solution: https://github.com/Telmate/terraform-provider-proxmox/issues/1015
Idealy the IP gets given but when this is not possible because of how cloud-init works, it just continuing and showing the issue seems like a good solution
TF_LOG=DEBUG terraform apply
):