Closed arsensimonyanpicsart closed 5 days ago
Getting the same error while trying to create 9 VMs in a Gitlab pipeline. It creates 4 and fails others with the same error. I updated provider's version to the current one (0.66.3) but can't say if new version introduced the problem because previously my pipeline was creating 4 VMs total (not 9 like now). Tried to mitigate with 'parallelism=3' but it didn't work - after creating 3+1 in two batches it failed. I believe @bpg already tried to address next VMID allocation issue in a few previous commits. I'll try to downgrade provider's version and see if it helps.
PVE 8.2.2 Terraform 1.9.3 bpg/proxmox 0.66.3
I believe that's the PR i mentioned with vmid allocation rework - https://github.com/bpg/terraform-provider-proxmox/pull/1557 Maybe we could try this new 'random_vm_ids' feature, i'll check it out for sure. Still would be nice to get standard behavior in somewhat working order.
Downgraded to 0.65.0 (as new vmid allocation features were added in 0.66), all 9 VMs were created successfully.
Thanks for testing @ratiborusx, we had a few other reports flagging this issue, so it's good to have a confirmation. I didn't have a chance to look into that yet, will try getting to it this weekend🤞
Returned back to 0.66.3 to check random vmid feature - it does work as declared, all 9 VMs were created successfully. Added these to 'provider' block:
provider "proxmox" {
...
random_vm_ids = true
random_vm_id_start = 90000
random_vm_id_end = 90999
...
}
So here's 2 ways to deal with the issue as of now, hopefully @bpg will be able to tinker with that stuff a bit more. For now i'll stay on 0.66.3 and will see how random vmid feature behaves. As i understand it should help prevent possible conflicts with vmid collision on allocation with parallel execution (for example a few pipelines and manual execution from workstation simultaneously on the same cluster) unlike pre-0.66.0 way.
I’m unable to pinpoint the issue 🤔. I can create six VMs simultaneously from the same clone without any problems. However, I noticed that the OP is cloning to a different node than the source, which I can’t test at the moment. @ratiborusx, is your use case similar, cloning between nodes?
@bpg Oh boy, time for some late night testing. I'm pretty sure it is not the case - because we do not have anything NAS/nfs yet i created a bootstrapping module to prepare cluster for usage where every "basic" resource (cloud configs, images and templates) is duplicated on each node. Something like this:
$ terraform state list
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_download_file.px_cloud_image["almalinux-9@prox-srv3"]
...
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_vm.px_template["debian-12-main@prox-srv3"]
...
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv1"]
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv2"]
module.proxmox_common.proxmox_virtual_environment_file.px_ci_data["userdata-proxmox-generic-automation@prox-srv3"]
So when i use another module for actual VM provisioning it uses specified template from the same node it is being created on. All of that because initial tests showed that cloning from another node takes too long (if i remember correctly first it creates VM on the same node template is located on and then migrates it via network on the specified node). I believe there were some problems with cloud-init configs too - if the same userdata is not present on the node you clone to (or migrate to) it runs again and uses the default cloud.cfg/cloud.cfg.d stuff (at least that's how i remember it from last tests a year ago). Also i declare 'clone.node_name' variable as optional and do not specify it ever - in that case it should be the same as resource's 'node_name'. BUT now i've got a question - what is the use for 'clone.node_name' (which is optional and defaults to 'node_name' of VM being created if empty) at all if we also need to specify a required argument 'clone.vm_id' and VMID is cluster-unique? I probably forgot some of that stuff but just couldn't answer myself on this one at the moment...
Here's some (a bit truncated) output, i believe template and the clone(s) are located on the same node (prox-srv2):
Plan: 10 to add, 0 to change, 0 to destroy.
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-mgt"]: Creating...
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [10s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [20s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [30s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-08"]: Creation complete after 32s [id=205]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Still creating... [40s elapsed]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-07"]: Creation complete after 41s [id=202]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-04"]: Creation complete after 44s [id=204]
proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]: Creation complete after 44s [id=203]
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-mgt"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-02"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-06"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-05"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-01"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-09"],
│ on main.tf line 17, in resource "proxmox_virtual_environment_vm" "px_vm":
│ 17: resource "proxmox_virtual_environment_vm" "px_vm" {
│
╵
$ terraform state show data.proxmox_virtual_environment_vms.template_vms
# data.proxmox_virtual_environment_vms.template_vms:
data "proxmox_virtual_environment_vms" "template_vms" {
id = "some-id-was-here-123abc"
tags = [
"templates",
]
vms = [
...
{
name = "astra-1.7.5-adv-main"
node_name = "prox-srv2"
status = "stopped"
tags = [
"image-astra-1.7.5-adv",
"templates",
"terraform",
]
template = true
vm_id = 149
},
]
}
$ terraform state show 'proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]'
# proxmox_virtual_environment_vm.px_vm["xdata-dev-stand-host-03"]:
resource "proxmox_virtual_environment_vm" "px_vm" {
...
id = "203"
...
node_name = "prox-srv2"
...
vm_id = 203
...
clone {
datastore_id = null
full = true
node_name = null
retries = 3
vm_id = 149
}
...
I'm getting this error also:
OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
OpenTofu will perform the following actions:
# proxmox_virtual_environment_vm.vm["au-pie-haos"] will be created
+ resource "proxmox_virtual_environment_vm" "vm" {
+ acpi = true
+ bios = "ovmf"
+ id = (known after apply)
+ ipv4_addresses = (known after apply)
+ ipv6_addresses = (known after apply)
+ keyboard_layout = "en-us"
+ mac_addresses = (known after apply)
+ machine = "q35"
+ migrate = false
+ name = "pie-haos"
+ network_interface_names = (known after apply)
+ node_name = "pie"
+ on_boot = true
+ protection = false
+ reboot = false
+ scsi_hardware = "virtio-scsi-single"
+ started = true
+ stop_on_destroy = false
+ tablet_device = true
+ template = false
+ timeout_clone = 1800
+ timeout_create = 1800
+ timeout_migrate = 1800
+ timeout_move_disk = 1800
+ timeout_reboot = 1800
+ timeout_shutdown_vm = 1800
+ timeout_start_vm = 1800
+ timeout_stop_vm = 300
+ vm_id = (known after apply)
+ cpu {
+ cores = 2
+ hotplugged = 0
+ limit = 0
+ numa = false
+ sockets = 1
+ type = "host"
+ units = 1024
}
+ disk {
+ aio = "io_uring"
+ backup = true
+ cache = "none"
+ datastore_id = "local-zfs"
+ discard = "on"
+ file_format = "raw"
+ interface = "virtio0"
+ iothread = true
+ path_in_datastore = (known after apply)
+ replicate = true
+ size = 128
+ ssd = false
}
+ efi_disk {
+ datastore_id = "local-zfs"
+ file_format = (known after apply)
+ pre_enrolled_keys = false
+ type = "4m"
}
+ memory {
+ dedicated = 4096
+ floating = 0
+ keep_hugepages = false
+ shared = 0
}
+ network_device {
+ bridge = "vmbr0"
+ enabled = true
+ firewall = true
+ mac_address = (known after apply)
+ model = "virtio"
+ mtu = 0
+ queues = 0
+ rate_limit = 0
+ vlan_id = 0
}
+ operating_system {
+ type = "l26"
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
OpenTofu will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
proxmox_virtual_environment_vm.vm["au-pie-haos"]: Creating...
╷
│ Error: unable to retrieve the next available VM identifier: context deadline exceeded
│
│ with proxmox_virtual_environment_vm.vm["au-pie-haos"],
│ on proxmox.tf line 45, in resource "proxmox_virtual_environment_vm" "vm":
│ 45: resource "proxmox_virtual_environment_vm" "vm" {
│
╵
Releasing state lock. This may take a few moments...
Have tried random_vm_ids
on and off, using provider version v0.66.3 and OpenTofu v1.8.3.
Removing the following from the ssh section of the provider seems to have made it work:
node {
address = var.terraform.proxmox.pie.host
name = var.terraform.proxmox.pie.name
}
Removing the following from the ssh section of the provider seems to have made it work:
node { address = var.terraform.proxmox.pie.host name = var.terraform.proxmox.pie.name }
That doesn't seem to be related 🤔 This section is to configure provider's SSH client. The "next ID" functionality is using only PVE REST API
Removing the following from the ssh section of the provider seems to have made it work:
node { address = var.terraform.proxmox.pie.host name = var.terraform.proxmox.pie.name }
That doesn't seem to be related 🤔 This section is to configure provider's SSH client. The "next ID" functionality is using only PVE REST API
Very strange - haven't had an issue since (although I may have missed something else!)
We are fighting with this issue for some time, very randomly it appears and currently breaks our deployment. According to our tests:
Testing with the newly released version 0.37.0 we were able to get some more logs regarding this issue. What immediately caught our attention was that IDs that are in use, are requested using the api call GET /api2/json/cluster/nextid?vmid=<UNAVAILABLE_ID>
. In the logs we could not find a single call to GET /api2/json/cluster/nextid
without the vmid
query parameter, which would have returned an available ID. As seen in the Proxmox logs the HTTP error code 400
is returned, which indicates that the requested ID is not available.
Proxmox logs:
REDACTED [20/Nov/2024:11:24:50.279] pve~ pve/REDACTED 0/0/4/3/7 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=190 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:50.486] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=191 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:50.693] pve~ pve/REDACTED 0/0/5/4/9 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=192 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:50.904] pve~ pve/REDACTED 0/0/0/3/3 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=193 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:51.109] pve~ pve/REDACTED 0/0/0/3/3 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=194 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:51.312] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=195 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:51.519] pve~ pve/REDACTED 0/0/5/2/7 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=196 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:51.732] pve~ pve/REDACTED 0/0/9/3/12 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=197 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:51.947] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=198 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:52.155] pve~ pve/REDACTED 0/0/3/2/5 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=199 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:52.361] pve~ pve/REDACTED 0/0/2/2/4 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=200 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:52.567] pve~ pve/REDACTED 0/0/3/2/5 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=201 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:52.774] pve~ pve/REDACTED 0/0/5/5/10 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=202 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:52.985] pve~ pve/REDACTED 0/0/4/5/9 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=203 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:53.197] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=204 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:53.404] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=205 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:53.611] pve~ pve/REDACTED 0/0/4/3/7 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=206 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:53.819] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=207 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:54.028] pve~ pve/REDACTED 0/0/4/4/8 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=208 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:54.236] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=209 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:54.443] pve~ pve/REDACTED 0/0/3/6/9 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=210 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:54.654] pve~ pve/REDACTED 0/0/3/2/5 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=211 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:54.861] pve~ pve/REDACTED 0/0/4/2/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=212 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:55.070] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=213 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:55.276] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=214 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:55.483] pve~ pve/REDACTED 0/0/4/2/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=215 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:55.691] pve~ pve/REDACTED 0/0/3/3/6 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=216 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:55.898] pve~ pve/REDACTED 0/0/5/3/8 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=217 HTTP/1.1"
REDACTED [20/Nov/2024:11:24:56.108] pve~ pve/REDACTED 0/0/4/3/7 400 304 - - ---- 2/2/0/0/0 0/0 "GET /api2/json/cluster/nextid?vmid=218 HTTP/1.1"
Additionally we can provide a relevant snippet from our tofu apply logs.
tofu apply logs:
2024-11-20T11:24:31.356Z [DEBUG] provider.terraform-provider-proxmox_v0.67.0: Sending HTTP Request: @caller=/home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.35.0/helper/logging/logging_http_transport.go:162 Host=REDACTED User-Agent=Go-http-client/1.1 tf_http_req_body="" tf_mux_provider=tf5to6server.v5tov6Server Accept=application/json tf_http_req_method=GET tf_provider_addr=registry.terraform.io/bpg/proxmox tf_req_id=19938013-de2b-01e4-2f70-fea77f785e61 Authorization="PVEAPIToken=REDACTED" tf_http_op_type=request tf_http_req_uri=/api2/json/cluster/nextid?vmid=218 tf_rpc=ApplyResourceChange @module=proxmox Accept-Encoding=gzip tf_http_req_version=HTTP/1.1 tf_http_trans_id=49601e70-85f3-f9b7-65b0-553a9a5f3c3f tf_resource_type=proxmox_virtual_environment_vm timestamp=2024-11-20T11:24:31.355Z
2024-11-20T11:24:31.364Z [DEBUG] provider.terraform-provider-proxmox_v0.67.0: Received HTTP Response: tf_http_res_status_code=400 tf_http_res_version=HTTP/1.1 tf_http_trans_id=49601e70-85f3-f9b7-65b0-553a9a5f3c3f tf_resource_type=proxmox_virtual_environment_vm Server=pve-api-daemon/3.0 tf_http_op_type=response tf_http_res_body="{\"errors\":{\"vmid\":\"VM 218 already exists\"},\"data\":null}" tf_req_id=19938013-de2b-01e4-2f70-fea77f785e61 @caller=/home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.35.0/helper/logging/logging_http_transport.go:162 @module=proxmox Cache-Control=max-age=0 Pragma=no-cache tf_http_res_status_reason="400 Parameter verification failed." tf_mux_provider=tf5to6server.v5tov6Server tf_rpc=ApplyResourceChange Content-Length=55 tf_provider_addr=registry.terraform.io/bpg/proxmox Content-Type=application/json;charset=UTF-8 Date="Wed, 20 Nov 2024 11:24:31 GMT" Expires="Wed, 20 Nov 2024 11:24:31 GMT" timestamp=2024-11-20T11:24:31.364Z
2024-11-20T11:24:31.523Z [ERROR] provider.terraform-provider-proxmox_v0.67.0: Response contains error diagnostic: diagnostic_summary="unable to retrieve the next available VM identifier: context deadline exceeded" tf_rpc=ApplyResourceChange diagnostic_severity=ERROR tf_provider_addr=registry.terraform.io/bpg/proxmox tf_req_id=19938013-de2b-01e4-2f70-fea77f785e61 @module=sdk.proto diagnostic_detail="" @caller=/home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-go@v0.25.0/tfprotov6/internal/diag/diagnostics.go:58 tf_proto_version=6.7 tf_resource_type=proxmox_virtual_environment_vm timestamp=2024-11-20T11:24:31.522Z
2024-11-20T11:24:31.540Z [DEBUG] State storage *remote.State declined to persist a state snapshot
2024-11-20T11:24:31.540Z [ERROR] vertex "REDACTED.proxmox_virtual_environment_vm.user_vm" error: unable to retrieve the next available VM identifier: context deadline exceeded
2024-11-20T11:24:31.556Z [WARN] provider.terraform-provider-proxmox_v0.67.0: unable to require attribute replacement: error="ForceNew: No changes for vm_id" tf_attribute_path=vm_id tf_req_id=b25857c0-a85e-1517-1667-fe4e5bf1f4ec @caller=/home/runner/go/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.35.0/helper/customdiff/force_new.go:32 tf_mux_provider=tf5to6server.v5tov6Server tf_provider_addr=registry.terraform.io/bpg/proxmox tf_resource_type=proxmox_virtual_environment_vm tf_rpc=PlanResourceChange @module=sdk.helper_schema timestamp=2024-11-20T11:24:31.556Z
We hope, that the provided insight helps to fix the underlying problem. Should more information be required, don't hesitate to reach out to us.
Thanks @caendekerk, that helps!
Randomizing the VM ids as suggested above did help to fix our deployment.
Related: #1574
Ok, so here is the use case that doesn't work well.
Precondition: there are a lot of VMs / containers on PVE, but there is a gap in ID allocation closer to the beginning of the range. For example: 100, 101, 102, 105, 106, \<continue without gaps>, 150 (technically the tail of continuous IDs after the gap should be at least 25)
Now imagine we need to provision 4 VMs form a single config. By default, TF parallelism is 4, so those VMs will start provision simultaneously in separate threads. They all ask for "get next id" from PVE at the same time, and all receive "here is: 103!" response from the PVE.
Obviously, that won't work. The provider has a locking mechanism that prevents parallel executions to use the same ID, so a thread that got into this condition uses a simple "+1" logic to try the next one. And this won't work in all cases, especially if the "gap" of unused IDs is smaller than the number of concurrently provisioning VMs at the moment. So the provider will keep enumerating ID up to find available until it times out (currently 5 sec, with 200ms delay between iterations).
It seems logical to use the "get next id" API call again instead of "+1", however, it is not guarantee to return the actually available ID, as the could be in-flight "create VM" call for e.g. ID 103, but it is not committed by PVE yet, so it may happily return 103 again.
I think a few retries should solve that. I'll also add an acceptance test to verify this scenario.
Randomizing the VM ids as suggested above did help to fix our deployment.
Another workaround besides using random IDs is to set parallelism = 1 for terraform|tofu apply
Fyi: Setting parallelism to 1 did not help in our case.
Describe the bug When creating two or more virtual machines (VMs) using the , an error occurs indicating a failure to retrieve the next available VM identifier.
To Reproduce Steps to reproduce the behavior:
Please also provide a minimal Terraform configuration that reproduces the issue.
and the output of
terraform|tofu apply
.Expected behavior Both vm1 and vm2 should be created successfully
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.
TF_LOG=DEBUG terraform apply
):