bpg / terraform-provider-proxmox

Terraform Provider for Proxmox
https://registry.terraform.io/providers/bpg/proxmox
Mozilla Public License 2.0
874 stars 140 forks source link

Some Virtual Machine template properties are marked to be updated in-place always #1494

Open n0ct1s-k8sh opened 3 months ago

n0ct1s-k8sh commented 3 months ago

When you create a virtual machine template (although I don't know if it happens in standalone VMs), the following properties are marked to be updated in-place always, even after every apply:

This is my shell log when I apply and plan for the first time:

vscode ➜ /workspaces/homelab-infra/pve (main) $ tta
data.proxmox_virtual_environment_datastores.pve_datastores: Reading...
data.proxmox_virtual_environment_version.pve_version: Reading...
data.proxmox_virtual_environment_version.pve_version: Read complete after 0s [id=version]
data.proxmox_virtual_environment_datastores.pve_datastores: Read complete after 0s [id=pve_datastores]

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717 will be created
  + resource "proxmox_virtual_environment_download_file" "image_debian_12_20240717" {
      + checksum            = "9ce1ce8c0f16958dd07bce6dd44d12f4d44d12593432a3a8f7c890c262ce78b0402642fa25c22941760b5a84d631cf81e2cb9dc39815be25bf3a2b56388504c6"
      + checksum_algorithm  = "sha512"
      + content_type        = "iso"
      + datastore_id        = "local"
      + file_name           = "debian-12-generic-amd64-20240717-1811.img"
      + id                  = (known after apply)
      + node_name           = "pve"
      + overwrite           = true
      + overwrite_unmanaged = false
      + size                = (known after apply)
      + upload_timeout      = 600
      + url                 = "https://cloud.debian.org/images/cloud/bookworm/20240717-1811/debian-12-generic-amd64-20240717-1811.qcow2"
      + verify              = true
    }

  # module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717 will be created
  + resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
      + acpi                    = true
      + bios                    = "ovmf"
      + boot_order              = [
          + "scsi0",
        ]
      + description             = "Managed by OpenTofu"
      + id                      = (known after apply)
      + ipv4_addresses          = (known after apply)
      + ipv6_addresses          = (known after apply)
      + keyboard_layout         = "es"
      + mac_addresses           = (known after apply)
      + machine                 = "q35"
      + migrate                 = false
      + name                    = "template-debian-12-20240717"
      + network_interface_names = (known after apply)
      + node_name               = "pve"
      + on_boot                 = false
      + protection              = false
      + reboot                  = false
      + scsi_hardware           = "virtio-scsi-pci"
      + stop_on_destroy         = false
      + tablet_device           = true
      + tags                    = [
          + "gnu+linux",
          + "debian12",
        ]
      + template                = true
      + timeout_clone           = 1800
      + timeout_create          = 1800
      + timeout_migrate         = 1800
      + timeout_move_disk       = 1800
      + timeout_reboot          = 1800
      + timeout_shutdown_vm     = 1800
      + timeout_start_vm        = 1800
      + timeout_stop_vm         = 300
      + vm_id                   = (known after apply)

      + agent {
          + enabled = true
          + timeout = "15m"
          + trim    = true
          + type    = "virtio"
        }

      + cpu {
          + architecture = "x86_64"
          + cores        = 1
          + flags        = [
              + "+aes",
              + "+md-clear",
              + "+pcid",
            ]
          + hotplugged   = 0
          + limit        = 0
          + numa         = false
          + sockets      = 1
          + type         = "host"
          + units        = 1024
        }

      + disk {
          + aio               = "io_uring"
          + backup            = true
          + cache             = "none"
          + datastore_id      = "local-lvm"
          + discard           = "on"
          + file_format       = "qcow2"
          + file_id           = (known after apply)
          + interface         = "scsi0"
          + iothread          = true
          + path_in_datastore = (known after apply)
          + replicate         = true
          + size              = 20
          + ssd               = true
        }

      + efi_disk {
          + datastore_id      = "local-lvm"
          + file_format       = (known after apply)
          + pre_enrolled_keys = true
          + type              = "2m"
        }

      + memory {
          + dedicated      = 1024
          + floating       = 0
          + keep_hugepages = false
          + shared         = 0
        }

      + network_device {
          + bridge      = "vmbr0"
          + enabled     = true
          + firewall    = false
          + mac_address = (known after apply)
          + model       = "virtio"
          + mtu         = 0
          + queues      = 0
          + rate_limit  = 0
          + vlan_id     = 0
        }

      + operating_system {
          + type = "l26"
        }

      + serial_device {
          + device = "socket"
        }

      + tpm_state {
          + datastore_id = "local-lvm"
          + version      = "v2.0"
        }

      + vga {
          + memory = 16
          + type   = "std"
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + current_version = {
      + release       = "8.2"
      + repository_id = "9355359cd7afbae4"
      + version       = "8.2.2"
    }

Do you want to perform these actions?
  OpenTofu will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Creating...
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Still creating... [10s elapsed]
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Creation complete after 11s [id=local:iso/debian-12-generic-amd64-20240717-1811.img]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Creating...
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Still creating... [11s elapsed]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Creation complete after 15s [id=100]

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

current_version = {
  "release" = "8.2"
  "repository_id" = "9355359cd7afbae4"
  "version" = "8.2.2"
}
vscode ➜ /workspaces/homelab-infra/pve (main) $ ttp 
data.proxmox_virtual_environment_datastores.pve_datastores: Reading...
data.proxmox_virtual_environment_version.pve_version: Reading...
data.proxmox_virtual_environment_version.pve_version: Read complete after 0s [id=version]
data.proxmox_virtual_environment_datastores.pve_datastores: Read complete after 0s [id=pve_datastores]
module.images.proxmox_virtual_environment_download_file.image_debian_12_20240717: Refreshing state... [id=local:iso/debian-12-generic-amd64-20240717-1811.img]
module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717: Refreshing state... [id=100]

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

OpenTofu will perform the following actions:

  # module.templates.proxmox_virtual_environment_vm.vm_template_debian_12_20240717 will be updated in-place
  ~ resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
        id                      = "100"
      ~ ipv4_addresses          = [] -> (known after apply)
      ~ ipv6_addresses          = [] -> (known after apply)
        name                    = "template-debian-12-20240717"
      ~ network_interface_names = [] -> (known after apply)
        tags                    = [
            "debian12",
            "gnu+linux",
        ]
        # (25 unchanged attributes hidden)

      ~ disk {
          ~ file_format       = "raw" -> "qcow2"
            # (12 unchanged attributes hidden)
        }

        # (9 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so OpenTofu can't guarantee to take exactly these actions if you run "tofu apply" now.

Steps to reproduce the behavior:

  1. Create a proxmox_virtual_environment_download_file resource to download a qcow2 disk image (file extension renamed to .img as the documentation says in the file_name option).
  2. Create a proxmox_virtual_environment_vm resource to create a VM template with the previously downloaded file as the disk.
  3. Run tofu apply to create all resources for the first time.
  4. Run tofu plan and apply those new changes.
  5. Run another tofu plan and the same changes are required to apply again and again.

Please also provide a minimal Terraform configuration that reproduces the issue.

Expected behavior It is expected to not have any pending changes in the VM resource.

Screenshots image image image image

bpg commented 3 months ago

Hi @n0ct1s-k8sh

From your output:

  ~ resource "proxmox_virtual_environment_vm" "vm_template_debian_12_20240717" {
        id                      = "100"
      ~ ipv4_addresses          = [] -> (known after apply)
      ~ ipv6_addresses          = [] -> (known after apply)
        name                    = "template-debian-12-20240717"
      ~ network_interface_names = [] -> (known after apply)
        tags                    = [
            "debian12",
            "gnu+linux",
        ]
        # (25 unchanged attributes hidden)

      ~ disk {
          ~ file_format       = "raw" -> "qcow2"
            # (12 unchanged attributes hidden)
        }

        # (9 unchanged blocks hidden)
    }

I see only disk.file_format is mishandled and causing the resource update.

ipv4_addresses, ipv6_addresses, network_interface_names are computed attributes, their values are retrieved from the VM, hence (known after apply) status. They are not to be updated in-place (i.e. in the VM), but marked with ~ as potentially to be updated in the TF state with the new values after apply.

robcxyz commented 2 months ago

I was also getting architecture creating a diff to be updated in-place but then after using an image with qemu agent pre-installed the diff went away.

Edit: Actually getting a diff now after adding a few more VMs on the architecture. Not sure why I thought this was resolved before since I remember specifically testing if I got a diff after another apply.

      ~ cpu {
          + architecture = "x86_64"
            # (9 unchanged attributes hidden)
        }

So this definitely seems like a bug.

windowsrefund commented 2 months ago

@bpg While I can appreciate your comment about the values being "potentially" updated by the VM, this does have a bit of a "false positive" bug kinda feel to it. After all, terraform plan is saying it wants to make 1 change. In a professional environment, that's not going to cut it and once the constant explanations wear thin, the condition will just be known and referred to as "that buggy provider's problem". Not saying that's correct... just saying that's the perception when it comes to false positives like this.

After confirming each of these are just empty lists in the state, I guess I'm wondering why they'd need to exist in the state to begin with? Granted, my question relates specifically to a template which has not been created with a initialization block like above.

GeorgeGedox commented 2 months ago

I also always seem to get the architecture to be updated on new machines even when the cpu architecture is specifically set in the config

 # (30 unchanged attributes hidden)

      ~ cpu {
          + architecture = "x86_64"
            # (9 unchanged attributes hidden)
        }

        # (7 unchanged blocks hidden)
bpg commented 2 months ago

@windowsrefund

After all, terraform plan is saying it wants to make 1 change

Yes, this is the actual problem I mentioned regarding the file_format attribute.

After confirming each of these are just empty lists in the state, I guess I'm wondering why they'd need to exist in the state to begin with?

The fact that other attributes like ipv4_addresses are marked as "changed" (~) with the value (known after apply) is expected because those attributes are computed, and their values are not known during the plan. They are in the state because we want to retrieve them after the VM resource creation. But in your case, this is a template, and the VM is not started, so there are no IPs.

There is nothing wrong with the computed attributes; we use quite a lot of them. The problem lies with the "regular" attributes that change value when the provider reads them back from PVE after resource creation (mostly) or update (rarely). These cases are bugs, for sure, and they are present across many resources implemented using the old and deprecated Terraform Plugin SDK. These issues are not always straightforward to fix, as the old SDK does not provide the necessary methods to handle default values for attributes, which is a root cause of these discrepancies.

The long-term plan is to fix all such bugs in the VM resource, as outlined in #1231.

In a professional environment, that's not going to cut it and once the constant explanations wear thin, the condition will just be known and referred to as "that buggy provider's problem".

Agreed, that's not ideal. However, this is a hobby project, and there are only so many hours left in a day after a day job and other commitments. I hope people understand.

bpg commented 2 months ago

@GeorgeGedox this should be fixed in #1524

zcallen1 commented 1 week ago

@bpg This is still an issue.

Versions: Terraform: 1.9.8 Provider: 0.66.3

I was on earlier Terraform and BPG versions with the same issues.

ratiborusx commented 2 days ago

@bpg I call upon ya oh the Great One! You may probably discount most of what i wrote as my edge case or way past midnight delirium, TLDR version is at the bottom. All the updates to my message were done on the go and i never actually "updated" it (first time posting this now) so the flow may look weird, sorry about that. Took me all the last night and half of today to figure it out.

1575 also looks to be related to the issue.

I'm also experiencing this issue on the latest 0.66.3. My observations so far with the 'cpu.architecture' attribute: Okay scratch that. I was about to post a big list of stuff i noticed but then while doing some tests i found this:

UPDATE Okay some more info about why empty value for arch seem to be okay with 0.64.0. Here's an error output from 0.61.1 (with an empty value):

Error: expected architecture to be one of ["aarch64" "x86_64"], got

And here is the same error on 0.64.0 (with some weird value):

Error: expected architecture to be one of ["" "aarch64" "x86_64"], got zzz

As we can see it looks that since 0.64.0 an empty ("") value for arch became valid. ALSO some weird and probably important stuff - if VM had arch value set in the state file then after upgrading to 0.64.0 and changing arch to an empty value when doing 'plan' it says no changes and checking state file i can confirm that. But after that i do 'apply' and provider says no changes done. And if then you check the state you will see that arch value was ERASED from it (became 'null') even tho that probably should not happen? To observe it yourself:

UPDATE 2 Well, it looks like 0.64.0 ALWAYS (i didn't test 'aarch64' though) sets arch to 'null' even though it says it wants to set it to 'x86_64'. Even more, apparently this weird magic works both ways! You can restore 'null' arch value to 'x86_64' after downgrading back to 0.61.1 and all that without actually detecting any changes and showing them. To observe:

It looks as if provider is somehow not even considering what's in the state about 'cpu.architecture', maybe doing some hardcoded changes. Though i reiterate that i didn't try 'aarch64', mostly because of this:

 Error: error updating VM: received an HTTP 500 response - Reason: only root can set 'arch' config

SOME ADDITIONAL INFO Overall it may look like setting architecture is not working at all and even useless either because some errors in provider or because Proxmox itself is not capable of doing that via API (though i DID check api-viewer and it's there). From what i observe even if arch value goes into the state it is still missing from an actual VM config on Proxmox node. Let's observe:

UPDATE 3 THE FINAL ONE Sorry didn't manage to finish it in time, got way too sleepy at 5AM. Continuing now... Okay, i finally understood the case. I found some VMs that had 'arch' attribute set in config and were deployed some time ago. After some tests it looks like 'cpu.architecture' can only be set by root user. Not very long ago we used root to deploy stuff but after we switched to non-root pam account all the things that were created by it do not have architecture attribute inside conf file. To reiterate, problem with endless updates to cpu.architecture may come if there's no actual 'arch:' inside vm conf file (check with 'qm config VMID') even though it may be present in the state file. It may happen if you're creating VM under non-root account (in my case it was non-root 'pam', but probably the same could be expected for 'pve' ones). Also the same will happen even if you're cloning the template with 'arch' present in template's vm config under non-root account - it won't be carried over to the actual clone and after subsequent plan run it will complain. It looks like provider is checking an actual API call return about 'arch' and do not consider (?) state file (i.e. 'cpu.architecture' is present in the state but there's no 'arch' in the vm's config file). Also overwrite of 'cpu.architecture' to 'null' on 0.64.0+ if it was set on previous versions is confirmed again but now it kinda makes sense if we're considering an actual vm config and not state file. Under non-root account it will try to change arch endlessly (setting it to 'null' on first try if it was something else), but if you switch to root account then after apply it will actually set arch value inside state file and inside vm conf file and won't complain again.

I'm not sure how to deal with it considering we'd prefer not to use root at all. For now i think it would be best on 0.64.0+ to just not set 'cpu.architecture' if you're using non-root account. In that case it'll be 'null' in the state and will not be present in manifest and vm conf file at all which in turn won't trigger any updates from provider.

=== Some unrelated stuff, I'm getting this error when trying to show state for a VM that was created on 0.61.1 after upgrading to 0.66.3:

Failed to marshal state to json: unsupported attribute "enabled"The state file is empty. No resources are represented.

I suspect it may've been not long ago removed 'vga.enabled' which is set to false on all my current VMs (i.e. its probably default because i do not set this attribute at all in my variables). I'm not sure if I'll be able to inspect state file on prod after upgrade now, will try to test on some small env. Yep, this is a problem for my prod - after upgrade i can not view content of a state file for VMs that were created before. Is there any way to make validation pass in here? Otherwise it kinda breaks backward compatibility. The only way to fix all these vm resources now is to manually edit state file (i think it would help) and remove "enabled = false" in vga {}.

bpg commented 2 days ago

@ratiborusx yeah, it will take some time to unload 😅, but I'll get to that, after fixing the ID problem 🤞🏼