Closed sorinpad closed 1 year ago
From your log file the interesting errors is here in the first step: https://gist.github.com/bartisan/39c5de53228e64f4b734a7e059db2da8#file-opennebula-terraform-provider-run-1-L824
It means that the provider is not able to match the vdb disk on cloud side with it's descriptions in the TF file. However this doesn't provide enough details on the exact error so I need to reproduce the problem to investigate more.
Infortunately I didn't reproduce the problem, here is my config:
Terraform v1.3.6 With the provider release 1.1.0 Opennebula 6.4.0 (deployed via minione)
Here is my test file (I had to add images descriptions to test in my dev environment):
resource "opennebula_image" "test" {
name = "test"
datastore_id = 1
type = "DATABLOCK"
size = "4096"
dev_prefix = "vd"
driver = "raw"
permissions = "660"
tags = {
billable = "true"
}
}
resource "opennebula_image" "test2" {
name = "test2"
datastore_id = 1
type = "DATABLOCK"
size = "3072"
dev_prefix = "vd"
driver = "raw"
permissions = "660"
tags = {
billable = "true"
}
}
resource "opennebula_virtual_machine" "vm" {
name = "testvm"
description = "test"
cpu = 1
vcpu = 1
memory = 768
group = "oneadmin"
permissions = "660"
disk {
image_id = opennebula_image.test.id
size = 4096
target = "vda"
driver = "qcow2"
}
disk {
image_id = opennebula_image.test2.id
size = 3072
target = "vdb"
driver = "qcow2"
}
on_disk_change = "SWAP"
}
I may miss something, do you reproduce the problem with my test file ?
Hey, @treywelsh,
Yes, I could reproduce the problem using your test file; only used a different size for the images as I don't have that much space on the default datastore.
Didn't mention it initially, also running on OpenNebula 6.4.0 (deployed via minione).
Initial run:
# opennebula_virtual_machine.vm will be created [53/1430]
+ resource "opennebula_virtual_machine" "vm" {
+ cpu = 1
+ default_tags = (known after apply)
+ description = "test"
+ gid = (known after apply)
+ gname = (known after apply)
+ group = "oneadmin"
+ hard_shutdown = false
+ id = (known after apply)
+ ip = (known after apply)
+ lcmstate = (known after apply)
+ memory = 768
+ name = "testvm"
+ on_disk_change = "SWAP"
+ pending = false
+ permissions = "660"
+ state = (known after apply)
+ tags_all = (known after apply)
+ template_disk = (known after apply)
+ template_id = -1
+ template_nic = (known after apply)
+ template_tags = (known after apply)
+ timeout = 20
+ uid = (known after apply)
+ uname = (known after apply)
+ vcpu = 1
+ disk {
+ computed_cache = (known after apply)
+ computed_dev_prefix = (known after apply)
+ computed_discard = (known after apply)
+ computed_driver = (known after apply)
+ computed_io = (known after apply)
+ computed_size = (known after apply)
+ computed_target = (known after apply)
+ computed_volatile_format = (known after apply)
+ disk_id = (known after apply)
+ driver = "qcow2"
+ image_id = (known after apply)
+ size = 4096
+ target = "vda"
}
+ disk {
+ computed_cache = (known after apply)
+ computed_dev_prefix = (known after apply)
+ computed_discard = (known after apply)
+ computed_driver = (known after apply)
+ computed_io = (known after apply)
+ computed_size = (known after apply)
+ computed_target = (known after apply)
+ computed_volatile_format = (known after apply)
+ disk_id = (known after apply)
+ driver = "qcow2"
+ image_id = (known after apply)
+ size = 3072
+ target = "vdb"
}
+ vmgroup {
+ role = (known after apply)
+ vmgroup_id = (known after apply)
}
}
Plan: 3 to add, 0 to change, 0 to destroy.
Subsequent run:
# opennebula_virtual_machine.vm will be updated in-place
~ resource "opennebula_virtual_machine" "vm" {
id = "158"
name = "testvm"
# (22 unchanged attributes hidden)
+ disk {
+ driver = "qcow2"
+ image_id = 28
+ size = 4096
+ target = "vda"
}
+ disk {
+ driver = "qcow2"
+ image_id = 29
+ size = 3072
+ target = "vdb"
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
Reproducible with terraform 1.3.6 and ON provider 1.0.2+
Hi @treywelsh, thanks for the hint of [WARN] Configuration for disk ID
it helped a lot, we found the issue thanks to that.
So the issue was that the disk was ignored because of the driver difference between the image and the vm disk (qcow2 vs raw). I am not entirely sure why you don't have this problem with the terraform code that you gave us but in our cluster the disk driver become raw instead of qcow2 (because the image is also raw) and then the disk is ignored.
So for our use-case we do have a nice workaround which is to correctly set the driver, but I think the provider should probably shows that it tries to remove the disk if it's not matched instead of ignoring it on update? This would probably give a hint of the problem to the operator... I am happy to give a try at fixing this, but I can't promise when unfortunately :(.
Thanks for the details it helps, I'm playing with the driver values, it seems that I'm able to reproduce in some cases (from the problem you describe)
To be sure I was clear on what's happening:
The two disks are attached on cloud side if you look in sunstone after VM creation step (first step).
Then, after creating the VM, the provider fetch the whole VM configuration from OpenNebula to read it.
The disks and nics code reading parts are trickier than for other attributes and there is a problem during the read step: the provider is able to recognize only one of the two disks (it compares TF description to cloud side VM informations via attributes values matching) and then it read only one disk description from cloud side.
The consequence is: the provider believe that only one disk is attached and this is why it tries to attach again a disk. But the disk is already attached and we have some conflict errors like Target vdb is already in use
.
Not sure it's only a provider bug (should be discussed) if we consider that OpenNebula receive a disk driver
value from the provider, but apply an other value instead without returning an error or giving a hint to the provider on what's happening.
Currently the provider is not able understand why OpenNebula didn't applied the disk with the provided attributes, it just believe it's an other disk that he don't know.
We could try to make the provider more tolerant by relaxing a bit the attribute comparison if it break nothing else, or just consider that current reading code doesn't properly work and rewrite it.
Personally I won't refactor disk/nic code parts without deeper changes in the provider or if we consider a full rewrite. I described some ideas in this comment
I can give you more details on how disk/nic management currently works in the provider if needed, feel free to share your thoughts or contribute if you think you have a better idea, any help/input is appreciated
This issue is stale because it has been open for 30 days with no activity and it has not the 'status: confirmed' label or it is not in a milestone. Remove the 'status: stale' label or comment, or this will be closed in 5 days.
Community Note
Terraform Version
Affected Resource(s)
Terraform Configuration Files
Debug Output
Initial run: https://gist.github.com/bartisan/39c5de53228e64f4b734a7e059db2da8#file-opennebula-terraform-provider-run-1 Subsequent runs: https://gist.github.com/bartisan/39c5de53228e64f4b734a7e059db2da8#file-opennebula-terraform-provider-run-2
Panic Output
N/A
Expected Behavior
Disks
vda
andvdb
get attached to the VM and subsequent terraform runs report nothing to apply.Actual Behavior
Terraform keeps trying to attach disk
vdb
(or any number of additional disks indisk
blocks) and fails as OpenNebula says it's already attached.Steps to Reproduce
terraform apply
shows this outputDo you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve.
Enter a value: yes
opennebula_virtual_machine.vm: Creating... opennebula_virtual_machine.vm: Still creating... [10s elapsed] opennebula_virtual_machine.vm: Creation complete after 19s [id=149]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
opennebula_virtual_machine.vm: Refreshing state... [id=149]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: ~ update in-place
Terraform will perform the following actions:
opennebula_virtual_machine.vm will be updated in-place
~ resource "opennebula_virtual_machine" "vm" { id = "149" name = "testvm"
(22 unchanged attributes hidden)
Plan: 0 to add, 1 to change, 0 to destroy.
Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve.
Enter a value: yes
opennebula_virtual_machine.vm: Modifying... [id=149] ╷ │ Error: Failed to update disk │ │ with opennebula_virtual_machine.vm, │ on provider.tf line 17, in resource "opennebula_virtual_machine" "vm": │ 17: resource "opennebula_virtual_machine" "vm" { │ │ virtual machine (ID: 149): vm disk attach: can't attach image to virtual machine (ID:149): OpenNebula error [ACTION]: [one.vm.attach] Target vdb is already in use.