bpg / terraform-provider-proxmox

Terraform Provider for Proxmox
https://registry.terraform.io/providers/bpg/proxmox
Mozilla Public License 2.0
768 stars 125 forks source link

No ip addresses are provided with enabled agent #776

Open BaldFabi opened 9 months ago

BaldFabi commented 9 months ago

Describe the bug I try to clone a vm from a template with an enabled agent. I've found the issue #100 which looks like it's related to my problem. It looks like the provider doesn't wait long enough (or something like that) because the ip is displayed in the Proxmox gui at the vm summary. Obviously the ip is not instantly available to Proxmox but after a couple of seconds after the vm started.

To Reproduce Steps to reproduce the behavior:

  1. Create a template which has the qemu agent already installed
  2. Create a config that clones that created template with an enabled agent directive
  3. The clone will fail and no ip is saved in the state file. But next to the vm the ip is displayed (as shown in the screenshot)
resource "proxmox_virtual_environment_vm" "machinexyz" {
  name      = "machinexyz"
  node_name = "server01"

  operating_system {
    type = "l26"
  }

  on_boot = true

  clone {
    vm_id = 912
  }

  agent {
    enabled = true
  }

  memory {
    dedicated = 4096
  }

  cpu {
    cores = 4
    type  = "x86-64-v2-AES"
  }

  disk {
    datastore_id = "pool1"
    size         = 20
    interface    = "scsi0"
  }

  connection {
    type     = "ssh"
    user     = "root"
    password = local.root_password
    host     = self.ipv4_addresses[0]
  }
}

Expected behavior The provider should wait the defined (or default) value of the timeout option

Screenshots IP in the Proxmox GUI image

The error

╷
│ Error: Attempt to index null value
│
│   on machine.tf line 45, in resource "proxmox_virtual_environment_vm" "machine":
│   45:     host     = self.ipv4_addresses[0]
│     ├────────────────
│     │ self.ipv4_addresses is null
│
│ This value is null, so it does not have any indices.

Additional context

BaldFabi commented 9 months ago

I just tried some things and found out that a previous warning I also had is the reason for this. My template had the iothread option set on the harddisk. After removing it the ipv4_addresses attribute wasn't null anymore. It's a little bit weird that the warning causes this.

Edit: And at the moment I don't have a clue how the ipv4_addresses is structured.

bpg commented 9 months ago

I just tried some things and found out that a previous warning I also had is the reason for this. My template had the iothread option set on the harddisk.

That could be related to #360, changing disk attributes while cloning might not always work as expected. But I'm also curios what what the "previous warning" that you also saw. Do you have it captured somewhere, by any chance?

Also, you many want to skip the disk block in the clone if you just want to use the disk from the template.

Edit: And at the moment I don't have a clue how the ipv4_addresses is structured.

You can check your local terraform.tfstate, for my test VM it is

            "ipv4_addresses": [
              [
                "127.0.0.1"
              ],
              [
                "192.168.3.205"
              ]
            ],
vrcdx64 commented 9 months ago

Hello,

I'm learning Terraform and I have exactly the same problem as described by the author. My template, with enabled qemu-agent, doesn't use iothread (default value is false).

If i check the values of ipv4_addresses, ipv6_addresses or network_interface_names after the error, in TF, the list are empty. But on the Proxmox web UI the values are here. I've tried to adjust the agent timeout value but the default is 15m which is enough.

ll see if I can dig into the problem. Don't hesitate to ask me if I can help to troubleshoot.

BaldFabi commented 9 months ago

I just rerun Terraform with the iothread attribute set on the template to trigger the warning again

╷
│ Warning: the VM startup task finished with a warning, task log:
│
│       | WARN: iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
│       | TASK WARNINGS: 1
│
│   with proxmox_virtual_environment_vm.machine,
│   on machine.tf line 1, in resource "proxmox_virtual_environment_vm" "machine":
│    1: resource "proxmox_virtual_environment_vm" "machine" {
│
╵

Also, you many want to skip the disk block in the clone if you just want to use the disk from the template.

But if I skip the disk block the disk wouldn't be cloned right? Otherwise I would have to recreate the template each time I want to provision a new vm or skip this step and provision and install it via iso.

You can check your local terraform.tfstate, for my test VM it is

            "ipv4_addresses": [
              [
                "127.0.0.1"
              ],
              [
                "192.168.3.205"
              ]
            ],

Thats a good hint. I did a rerun without the iothread attribute to prevent the warning again. The ipv4_addresses in my state file are now like yours. Wouldn't it make sense to purge 127.0.0.1 and simplify the slice to be one dimensional?

otopetrik commented 9 months ago

Wouldn't it make sense to purge 127.0.0.1 and simplify the slice to be one dimensional?

Probably not. There are use cases for VMs with multiple interfaces (router, internal cluster networks, etc...), and even some use cases for one interface to have multiple addresses (high-availability using virtual ip).

The provider waits for one "reasonable" ip address (i.e. better than link-local), this fixes the original issue, where link-local ipv6 address was obtained the faster than ipv4 from DHCP server.

In cases where waiting for multiple interfaces/addresses is required, it should be possible to delay starting the qemu-guest-agent inside the VM until all addresses are obtained (by modifying guest agent's systemd unit dependencies).

ipv4_addresses data is taken directly from qemu-guest-agent, which reports all interfaces (including loopback) and uses names used by the system inside the VM (i.e. not "net0" but "eth0","eno1", "enp5s0" and likely even language-specific names in case of windows VMs).

Using fixed index like self.ipv4_addresses[0] does not really work.

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

It might be useful to add something like ipv4_addresses_by_device[], which would use mac addresses of configured network devices to find matching ip addresses from qemu-guest-agent output, then ipv4_addresses_by_device[0] would really mean ipv4 address of 'net0' network device of the VM.

(Changing behavior of existing ipv4_addresses is probably not a good idea. It would break existing configurations and it can be useful to have access to IP addresses assigned to non-hardware interfaces - e.g. VPN, PPPoE,...)

Sorixelle commented 7 months ago

I ran into this issue today, where the state refresh was timing out, and ipv4_addresses etc. were empty after an apply. The issue turned out to be that I had not granted the Proxmox user I configured the provider with the VM.Monitor privilege, which seems to be required to be able to retrieve this information. Just dropping this one in here, in case anyone runs into the same problem.

I wonder, could this be handled better? The API route being called was returning a 403 response in this case, so it would be possible for the provider to catch this case and show an error message to the user. If that's desired, I can open a separate issue to track that.

bpg commented 7 months ago

Hi @Sorixelle! 👋🏼

Thanks for sharing your use case. That's a good suggestion, the provider can definitely handle this type of errors better. Please go ahead and open a separate issue for this enhancement, much appreciated!

bpg-autobot[bot] commented 1 month ago

Marking this issue as stale due to inactivity in the past 180 days. This helps us focus on the active issues. If this issue is reproducible with the latest version of the provider, please comment. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!