bpg / terraform-provider-proxmox

Terraform Provider for Proxmox
https://registry.terraform.io/providers/bpg/proxmox
Mozilla Public License 2.0
845 stars 138 forks source link

Proxmox VM Creation Fails with ‘Unable to Retrieve VM Identifier’ Error When Cloning Multiple VMs Simultaneously #1610

Open arsensimonyanpicsart opened 5 days ago

arsensimonyanpicsart commented 5 days ago

Describe the bug When creating two or more virtual machines (VMs) using the , an error occurs indicating a failure to retrieve the next available VM identifier.

To Reproduce Steps to reproduce the behavior:

  1. Create a Proxmox resource with the following Terraform configuration.
  2. Run terraform apply (or tofu apply).
  3. Observe that vm2 is created successfully, but vm1 fails with an error related to VM identifier retrieval.
  4. Modify the resource '....'
  5. Run '....'
  6. See error

Please also provide a minimal Terraform configuration that reproduces the issue.


# >>> terraform {
  required_providers {
    proxmox = {
      source  = "bpg/proxmox"
      version = "0.66.3"
    }
  }
}
provider "proxmox" {
  endpoint  = "https://xxx:8006/"
  api_token = "xxx"
  insecure  = true
}

resource "proxmox_virtual_environment_vm" "vm1" {
  name            = "va-vm-name1"
  node_name       = "va-dev-proxmox01"
  stop_on_destroy = true

  clone {
    vm_id     = 9001
    full      = true
    node_name = "va-dev-proxmox02"
  }
}

resource "proxmox_virtual_environment_vm" "vm2" {
  name            = "va-vm-name2"
  node_name       = "va-dev-proxmox01"
  stop_on_destroy = true

  clone {
    vm_id     = 9001
    full      = true
    node_name = "va-dev-proxmox02"
  }
} <<< #

and the output of terraform|tofu apply.

proxmox_virtual_environment_vm.vm2: Creating...
proxmox_virtual_environment_vm.vm2: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.vm2: Creation complete after 18s [id=115]
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.vm1,
│   on main.tf line 16, in resource "proxmox_virtual_environment_vm" "vm1":
│   16: resource "proxmox_virtual_environment_vm" "vm1" {

Expected behavior Both vm1 and vm2 should be created successfully

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

ratiborusx commented 4 days ago

Getting the same error while trying to create 9 VMs in a Gitlab pipeline. It creates 4 and fails others with the same error. I updated provider's version to the current one (0.66.3) but can't say if new version introduced the problem because previously my pipeline was creating 4 VMs total (not 9 like now). Tried to mitigate with 'parallelism=3' but it didn't work - after creating 3+1 in two batches it failed. I believe @bpg already tried to address next VMID allocation issue in a few previous commits. I'll try to downgrade provider's version and see if it helps.

PVE 8.2.2 Terraform 1.9.3 bpg/proxmox 0.66.3

ratiborusx commented 4 days ago

I believe that's the PR i mentioned with vmid allocation rework - https://github.com/bpg/terraform-provider-proxmox/pull/1557 Maybe we could try this new 'random_vm_ids' feature, i'll check it out for sure. Still would be nice to get standard behavior in somewhat working order.

ratiborusx commented 4 days ago

Downgraded to 0.65.0 (as new vmid allocation features were added in 0.66), all 9 VMs were created successfully.

bpg commented 4 days ago

Thanks for testing @ratiborusx, we had a few other reports flagging this issue, so it's good to have a confirmation. I didn't have a chance to look into that yet, will try getting to it this weekend🤞

ratiborusx commented 4 days ago

Returned back to 0.66.3 to check random vmid feature - it does work as declared, all 9 VMs were created successfully. Added these to 'provider' block:

provider "proxmox" {
...
  random_vm_ids      = true
  random_vm_id_start = 90000
  random_vm_id_end   = 90999
...
}

So here's 2 ways to deal with the issue as of now, hopefully @bpg will be able to tinker with that stuff a bit more. For now i'll stay on 0.66.3 and will see how random vmid feature behaves. As i understand it should help prevent possible conflicts with vmid collision on allocation with parallel execution (for example a few pipelines and manual execution from workstation simultaneously on the same cluster) unlike pre-0.66.0 way.

bpg commented 2 days ago

I’m unable to pinpoint the issue 🤔. I can create six VMs simultaneously from the same clone without any problems. However, I noticed that the OP is cloning to a different node than the source, which I can’t test at the moment. @ratiborusx, is your use case similar, cloning between nodes?

ratiborusx commented 1 day ago

@bpg Oh boy, time for some late night testing. I'm pretty sure it is not the case - because we do not have anything NAS/nfs yet i created a bootstrapping module to prepare cluster for usage where every "basic" resource (cloud configs, images and templates) is duplicated on each node. Something like this:

$ terraform state list
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv3"]
...
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv3"]
...
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv3"]

So when i use another module for actual VM provisioning it uses specified template from the same node it is being created on. All of that because initial tests showed that cloning from another node takes too long (if i remember correctly first it creates VM on the same node template is located on and then migrates it via network on the specified node). I believe there were some problems with cloud-init configs too - if the same userdata is not present on the node you clone to (or migrate to) it runs again and uses the default cloud.cfg/cloud.cfg.d stuff (at least that's how i remember it from last tests a year ago). Also i declare 'clone.node_name' variable as optional and do not specify it ever - in that case it should be the same as resource's 'node_name'. BUT now i've got a question - what is the use for 'clone.node_name' (which is optional and defaults to 'node_name' of VM being created if empty) at all if we also need to specify a required argument 'clone.vm_id' and VMID is cluster-unique? I probably forgot some of that stuff but just couldn't answer myself on this one at the moment...

Here's some (a bit truncated) output, i believe template and the clone(s) are located on the same node (prox-srv2):

Plan: 10 to add, 0 to change, 0 to destroy.
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-mgt"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Creation complete after 32s [id=205]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Creation complete after 41s [id=202]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Creation complete after 44s [id=204]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Creation complete after 44s [id=203]
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-mgt"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│   with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"],
│   on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│   17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵

$ terraform state show data.proxmox_virtual_environment_vms.template_vms
# data.proxmox_virtual_environment_vms.template_vms:
data "proxmox_virtual_environment_vms" "template_vms" {
    id   = "some-id-was-here-123abc"
    tags = [
        "templates",
    ]
    vms  = [
    ...
        {
            name      = "astra-1.7.5-adv-main"
            node_name = "prox-srv2"
            status    = "stopped"
            tags      = [
                "image-astra-1.7.5-adv",
                "templates",
                "terraform",
            ]
            template  = true
            vm_id     = 149
        },
    ]
}

$ terraform state show 'proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]'
# proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]:
resource "proxmox_virtual_environment_vm" "px_vm" {
...
id                      = "203"
...
node_name               = "prox-srv2"
...
vm_id                   = 203
...
clone {
        datastore_id = null
        full         = true
        node_name    = null
        retries      = 3
        vm_id        = 149
    }
...