OpenNebula / terraform-provider-opennebula

Terraform provider for OpenNebula
https://www.terraform.io/docs/providers/opennebula/
Mozilla Public License 2.0
61 stars 54 forks source link

Failed to wait virtual machine to be in RUNNING state when changing context #473

Closed DeBuXer closed 1 year ago

DeBuXer commented 1 year ago

Description

When I change the context part of an existing instance, the message "Failed to wait virtual machine to be in RUNNING state" almost always appears.

However, the context is changed in Opennebula and the state of the instance is also short "HOTPLUG". And it will always changes back to "RUNNING" within a few seconds.

Terraform and Provider version

Terraform v1.5.2 on linux_amd64

Affected resources and data sources

opennebula_virtual_machine

Terraform configuration

Before the context change:

resource "opennebula_image" "clone_image" {
    clone_from_image = "300"
    name             = "example.com"
    datastore_id     = 100
    persistent       = true
}

resource "opennebula_virtual_machine" "create_servers" {
  name        = "example.com"
  cpu         = 2
  vcpu        = 2
  memory      = 4 * 1024
  group       = "FOOBA"

  context = {
    NETWORK      = "YES"
    SET_HOSTNAME = "$NAME"
    SSH_PUBLIC_KEY = file("/root/.ssh/id_rsa.pub")
  }

  graphics {
  type   = "VNC"
  listen = "0.0.0.0"
  }

  os {
    arch = "x86_64"
    boot = "disk0"
  }

  disk {
    image_id = opennebula_image.clone_image.id
    target   = "sda"
  }

  nic {
    model           = "virtio"
    network_id      = 3
  }
}

After the context change:

resource "opennebula_image" "clone_image" {
    clone_from_image = "300"
    name             = "example.com"
    datastore_id     = 100
    persistent       = true
}

resource "opennebula_virtual_machine" "create_servers" {
  name        = "example.com"
  cpu         = 2
  vcpu        = 2
  memory      = 4 * 1024
  group       = "FOOBA"

  context = {
    NETWORK      = "YES"
  }

  graphics {
  type   = "VNC"
  listen = "0.0.0.0"
  }

  os {
    arch = "x86_64"
    boot = "disk0"
  }

  disk {
    image_id = opennebula_image.clone_image.id
    target   = "sda"
  }

  nic {
    model           = "virtio"
    network_id      = 3
  }
}

Expected behavior

Terraform waits until the state is "RUNNING" again and exits successfully.

Actual behavior

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # opennebula_virtual_machine.create_servers will be updated in-place
  ~ resource "opennebula_virtual_machine" "create_servers" {
      ~ context                = {
          - "SET_HOSTNAME"   = "$NAME" -> null
          - "SSH_PUBLIC_KEY" = <<-EOT
                ssh-rsa <SSH_PUBLIC_KEY>
            EOT -> null
            # (1 unchanged element hidden)
        }
        id                     = "755"
        name                   = "example.com"
        # (23 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.
opennebula_virtual_machine.create_servers: Modifying... [id=755]
opennebula_virtual_machine.create_servers: Still modifying... [id=755, 10s elapsed]
╷
│ Error: Failed to wait virtual machine to be in RUNNING state
│ 
│   with opennebula_virtual_machine.create_servers,
│   on server.tf line 8, in resource "opennebula_virtual_machine" "create_servers":
│    8: resource "opennebula_virtual_machine" "create_servers" {
│ 
│ virtual machine (ID: 755): GOCA client error [REQUEST_HTTP]: http make request: Post "https://api.example.tld/RPC2": EOF

Steps to Reproduce

Create a new instance with the configuration above. Then remove the "SSH_PUBLIC_KEY" part from the context and apply the configuration. I got the same behavior when I remove the "SET_HOSTNAME". It does not matter whether you make 1 change to the context or several at the same time.

Debug output

No response

Panic output

No response

Important factoids

No response

References

No response

treywelsh commented 1 year ago

Doesn't seems to be a provider error, it's a network error: GOCA client error [REQUEST_HTTP]: http make request: Post "https://api.example.tld/RPC2": EOF

This error come from the goca HTTP client: https://github.com/OpenNebula/one/blob/master/src/oca/go/src/goca/client.go#L145

Require more investigations: may be due to your ONE setup, or goca code... If you have additional relevant informations feel free to drop a comment, else I may close this issue in the next days

DeBuXer commented 1 year ago

@treywelsh , this issue can be closed. It is something in our environment or maybe there is some delay when running Opennebula in a HA setup. Not sure yet.. but today I tried the same scripts in a envoirment created with "minione" and this is working fine. So it has nothing to do with the Opennebula Terraform provider :)

Sorry for wasting your time!

treywelsh commented 1 year ago

Not a problem, thanks for your feedback