Disk resize for VM not updated in terraform state

mattburchett commented 6 months ago

Describe the bug After increasing the disk size in Terraform, the state still contains the old values and tries to increase again.

To Reproduce Steps to reproduce the behavior:

Create a VM resource
Increase disk size by any amount
terraform apply
once complete, run terraform plan

Expected behavior Terraform's state should have the new value after applying.

Log

# Original Apply
Terraform will perform the following actions:

  # proxmox_virtual_environment_vm.vms["web"] will be updated in-place
  ~ resource "proxmox_virtual_environment_vm" "vms" {
        id                      = "100"
        name                    = "web"
      + protection              = false
        # (26 unchanged attributes hidden)

      ~ disk {
          ~ size              = 50 -> 100
            # (11 unchanged attributes hidden)
        }

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

proxmox_virtual_environment_vm.vms["web"]: Modifying... [id=100]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 10s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 20s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 30s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 40s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 50s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 1m0s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 1m10s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 1m20s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Still modifying... [id=100, 1m30s elapsed]
proxmox_virtual_environment_vm.vms["web"]: Modifications complete after 1m34s [id=100]

# terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

  # proxmox_virtual_environment_vm.vms["web"] will be updated in-place
  ~ resource "proxmox_virtual_environment_vm" "vms" {
        id                      = "100"
        name                    = "web"
        # (27 unchanged attributes hidden)

      ~ disk {
          ~ size              = 50 -> 100
            # (11 unchanged attributes hidden)
        }

        # (8 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Single or clustered Proxmox: Single
Proxmox version: 8.x
Provider version (ideally it should be the latest version): latest (v0.55.1)
Terraform/OpenTofu version: Terraform 1.5.7
OS (where you run Terraform/OpenTofu from): Ubuntu 22.04 LTS

Let me know if I can provide any more information that would be useful.

bpg commented 6 months ago

One question about the step 1: have you cloned the VM from a template / another VM, or created from scratch?

mattburchett commented 6 months ago

It is a full clone (not a linked clone) from a template.

I'm also not sure if relevant, but the template uses an image of Ubuntu's cloud-init images.

KorzunKarl commented 6 months ago

@mattburchett You need to install and enable, qemu-guest-agent to target VM, because after resizing the VM requires a reboot, and without an agent this is not possible even from the interface proxmox. Install you may do with cloud-init, and add agent = true to target VM

mattburchett commented 6 months ago

@mattburchett You need to install and enable, qemu-guest-agent to target VM, because after resizing the VM requires a reboot, and without an agent this is not possible even from the interface proxmox. Install you may do with cloud-init, and add agent = true to target VM

It does have qemu-guest-agent enabled.

And my Terraform config:

resource "proxmox_virtual_environment_vm" "vms" {
  for_each = local.pve-hosts

  name        = each.key
  description = "Managed by Terraform"

  node_name = each.value.target-node
  vm_id     = split(".", each.value.ip.address)[3]

  agent {
    enabled = true
  }

  clone {
    datastore_id = each.value.hardware.storage
    retries      = 10
    node_name    = "lrhq-pve"
    vm_id        = each.value.template
  }

  cpu {
    cores   = each.value.hardware.cores
    sockets = 1
    type    = "host"
  }

  disk {
    datastore_id = each.value.hardware.storage
    interface    = "virtio0"
    size         = each.value.hardware.disk_size
  }

  initialization {
    datastore_id = each.value.hardware.storage
    ip_config {
      ipv4 {
        address = "${each.value.ip.address}/${each.value.ip.cidr}"
        gateway = each.value.ip.gw
      }
    }

    user_data_file_id = proxmox_virtual_environment_file.ubuntu_cloud_config[each.key].id
  }

  lifecycle {
    ignore_changes = [
      initialization[0].user_data_file_id
    ]
  }

  memory {
    dedicated = each.value.hardware.memory
  }

  network_device {}

  on_boot = true

  operating_system {
    type = "l26"
  }

  serial_device {}

  depends_on = [cloudflare_record.proxmox-pve-dns]
}

KorzunKarl commented 6 months ago

@mattburchett Please check status of agent on this VM ?

mattburchett commented 6 months ago

web ~ [0]# systemctl status qemu-guest-agent.service
● qemu-guest-agent.service - QEMU Guest Agent
     Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; static)
     Active: active (running) since Wed 2024-05-08 17:59:16 UTC; 4s ago
   Main PID: 148060 (qemu-ga)
      Tasks: 2 (limit: 2309)
     Memory: 688.0K
        CPU: 6ms
     CGroup: /system.slice/qemu-guest-agent.service
             └─148060 /usr/sbin/qemu-ga

May 08 17:59:16 lrhq-web.linuxrocker.cloud systemd[1]: Started QEMU Guest Agent.

I did restart it a moment ago, because it was having some errors for guest ping in the logs, but they weren't recent.

And after the restart, a terraform plan still shows that it wants to change the disk.

  # proxmox_virtual_environment_vm.vms["web"] will be updated in-place
  ~ resource "proxmox_virtual_environment_vm" "vms" {
        id                      = "100"
        name                    = "web"
        # (27 unchanged attributes hidden)

      ~ disk {
          ~ size              = 50 -> 100
            # (11 unchanged attributes hidden)
        }

        # (8 unchanged blocks hidden)
    }

bpg commented 6 months ago

@mattburchett Hmm... I can't reproduce this issue in my simple test. 🤔 Does your template VM have more than one disk? Could you possibly take a screenshot of its hardware configuration and post it here?

mattburchett commented 6 months ago

Sure thing.

I'll also toss in the information for my template creation:

resource "proxmox_virtual_environment_file" "ubuntu_jammy_template" {
  content_type = "iso"
  datastore_id = "local"
  node_name    = "lrhq-pve"

  source_file {
    path = "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
  }
}

resource "proxmox_virtual_environment_vm" "ubuntu-jammy-template" {
  name        = "ubuntu-jammy-template"
  description = "Managed by Terraform"

  node_name = "lrhq-pve"
  vm_id     = 9006

  cpu {
    cores   = 1
    sockets = 1
    type    = "kvm64"
    flags   = ["+aes"]
  }

  disk {
    datastore_id = "local-zfs"
    file_id      = proxmox_virtual_environment_file.ubuntu_jammy_template.id
    interface    = "virtio0"
  }

  on_boot  = false
  started  = false
  template = true
}

mattburchett commented 6 months ago

Well, that's interesting. I actually wonder if it's a proxmox bug.

On the VM, I can see the disk increase took place:

web ~ [0]# df -h  /
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        97G   19G   79G  19% /

But when I view the hardware for the VM in Proxmox, it shows a disk size of 50G.

I'm going to try and run some proxmox updates to bring myself to the latest version, and I'll test with another VM and see if I can replicate it.

mattburchett commented 6 months ago

Yeah, I think this might be a proxmox bug. It still happened on my test VM after an update to 8.2.2 (from 8.0).

I found someone with the same issue, but with a LXC container: https://bugzilla.proxmox.com/show_bug.cgi?id=305

I gave some information over there to see if it's a bug on their end.

bpg commented 6 months ago

Hmm... I've tried your exact template & vm on both ZFS and LVM storages, and had no issues. PVE v8.2.2 as well. Perhaps something specific to your ZFS config?

mattburchett commented 6 months ago

I'm not certain honestly. I don't think I've done anything specific with ZFS on Proxmox. It's pretty much a bog-standard single-node install with ZFS in a RAIDZ2, with ZFS on root, all done through the installer.

I do have some server setup that is done via Ansible, but nothing that messes with the storage arrays. It pretty much just installs monitoring and sets up my shell.

svengreb commented 6 months ago

@mattburchett Is there is hardwar RAID controller where ZFS runs on? If so this is not supported and recommended by ZFS because ZFS "likes" to see all disks directly and some some features even require this, e.g. to protect against silent bitrot.

mattburchett commented 6 months ago

Proxmox is running on a PowerEdge R430 with a PERC H730 Mini. The PERC is not configured and is just passing the devices through via JBOD.

A zpool status from the host:

lrhq-pve ~ [0]# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 01:11:57 with 0 errors on Sun May 12 01:35:58 2024
config:

        NAME                                                   STATE     READ WRITE CKSUM
        rpool                                                  ONLINE       0     0     0
          raidz2-0                                             ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNS0W113188Z-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNM0TA47793P-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNS0W113144X-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNM0TA47728D-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNS0W113153M-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNM0TA47790F-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNM0TA48482L-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_1TB_S6PTNM0TA49482Y-part3  ONLINE       0     0     0

errors: No known data errors

bpg commented 2 months ago

@mattburchett I'm not quite sure what else we can do here. Please feel free to reopen this issue if you need further assistance.

bpg / terraform-provider-proxmox

Disk resize for VM not updated in terraform state #1271