equinix / terraform-provider-equinix

Terraform Equinix provider
https://deploy.equinix.com/labs/terraform-provider-equinix/
MIT License
47 stars 45 forks source link

404 when creating spot market request suddenly? #163

Open colemickens opened 2 years ago

colemickens commented 2 years ago

I'm getting: GET https://api.equinix.com/metal/v1/devices?include=facility: 404 Not found.

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # metal_spot_market_request.pktspotamd0 will be created
  + resource "metal_spot_market_request" "pktspotamd0" {
      + devices_max      = 1
      + devices_min      = 1
      + facilities       = (known after apply)
      + id               = (known after apply)
      + max_bid_price    = 0.5
      + metro            = "sv"
      + project_id       = "afc67974-ff22-41fd-9346-5b2c8d51e3a9"
      + wait_for_devices = true

      + instance_parameters {
          + always_pxe        = false
          + billing_cycle     = "hourly"
          + hostname          = "pktspotamd0"
          + operating_system  = "ubuntu_18_04"
          + plan              = "c3.medium.x86"
          + termintation_time = (known after apply)
          + userdata          = <<-EOT
                #!/usr/bin/env bash
                set -xeuo pipefail

                ##
                ##
                # ./userdata/install-nix.sh
                #!/usr/bin/env bash
                set -euo pipefail
                set -x

                USERNAME="cole"
                NIX_INSTALL_URL="https://github.com/numtide/nix-unstable-installer/releases/download/nix-2.5pre20211026_5667822/install"

                # TODO: support re-exec as root if we're not
                # check if we're not "cole" and if so, make it and then re-exec *again*

                if [[ "${1:-""}" != "stage2" ]]; then
                  if [[ "$(whoami)" != "${USERNAME}}" ]]; then
                    sudo adduser --gecos "" --disabled-password "${USERNAME}"
                    mkdir -p /home/"${USERNAME}"/.ssh
                    curl -L "https://github.com/colemickens.keys" > /home/cole/.ssh/authorized_keys
                    sudo chown -R cole /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo-w /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo+rx /home/"${USERNAME}"/.ssh
                    sudo chmod -R ugo-w /home/"${USERNAME}"/.ssh
                    sudo chmod -R u+rw /home/"${USERNAME}"/.ssh
                    sudo chmod u+x /home/"${USERNAME}"/.ssh
                    sudo usermod -aG sudo "${USERNAME}"
                    echo "%sudo   ALL=(ALL:ALL) NOPASSWD:ALL" | sudo tee -a /etc/sudoers
                    sudo cp "${0}" "/tmp/nix-unstable.sh"
                    sudo chmod ugo+rx "/tmp/nix-unstable.sh"
                    sudo -u "${USERNAME}" "/tmp/nix-unstable.sh" stage2
                  fi
                  exit 0
                fi

                # TODO: pull out extra subs/keys to TF var?
                # TODO: keep in sync: commbox.sh/install-nix.sh
                curl -L "${NIX_INSTALL_URL}" > /tmp/install
                sudo chmod +x /tmp/install
                /tmp/install --daemon &> /tmp/nix-install.log

                sudo mkdir -p "/etc/nix"
                cat <<EOF | sudo tee -a "/etc/nix/nix.conf"
                experimental-features = nix-command flakes ca-references
                extra-substituters = https://colemickens.cachix.org https://nixpkgs-wayland.cachix.org https://arm.cachix.org https://thefloweringash-armv7.cachix.org
                extra-trusted-public-keys = colemickens.cachix.org-1:bNrJ6FfMREB4bd4BOjEN85Niu8VcPdQe4F4KxVsb/I4= nixpkgs-wayland.cachix.org-1:3lwxaILxMRkVhehr5StQprHdEo4IrE8sRho9R9HOLYA= arm.cachix.org-1:5BZ2kjoL1q6nWhlnrbAl+G7ThY7+HaBRD9PZzqZkbnM= thefloweringash-armv7.cachix.org-1:v+5yzBD2odFKeXbmC+OPWVqx4WVoIVO6UXgnSAWFtso=
                trusted-users = root @sudo
                cores = 0
                max-jobs = auto
                EOF

                sudo systemctl restart nix-daemon

                BASHRC="$(cat "/etc/bash.bashrc")"
                NIXSNIPPET="$(cat "/etc/profile.d/nix.sh")"
                printf '%s\n#####\n%s' \
                  "${NIXSNIPPET}" \
                  "${BASHRC}" | sudo tee "/etc/bash.bashrc"

                source "/etc/profile.d/nix.sh"
                nix --version

                echo "install-nix: all done!"

            EOT
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

metal_spot_market_request.pktspotamd0: Creating...
╷
│ Error: Failed to fetch Device with following error: GET https://api.equinix.com/metal/v1/devices?include=facility: 404 Not found 
│ 
│   with metal_spot_market_request.pktspotamd0,
│   on config.tf.json line 1, in resource.metal_spot_market_request.pktspotamd0:
│    1: {"data":{"oci_identity_availability_domain":{"default_ad":[{"ad_number":1,"compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta"}]}},"provider":{"metal":[null],"oci":[{"fingerprint":"d4:d8:ce:6c:c4:ca:b9:ab:11:ac:2a:1f:1b:e7:70:71","private_key_path":"/run/secrets/oraclecloud_colemickens_privkey","region":"us-phoenix-1","tenancy_ocid":"ocid1.tenancy.oc1..aaaaaaaafyqmgtgi5nwkolwjujayjrx5qw2qmzpbp7wzche2kgmdrlptnj4q","user_ocid":"ocid1.user.oc1..aaaaaaaah76dpd2bz6pqmy53t2p7mxy3wieydldjxshmnpe6nsoensqieulq"}]},"resource":{"metal_spot_market_request":{"pktspotamd0":{"devices_max":1,"devices_min":1,"instance_parameters":{"billing_cycle":"hourly","hostname":"pktspotamd0","operating_system":"ubuntu_18_04","plan":"c3.medium.x86","userdata":"${templatefile(\"/nix/store/78znmjqb5cnmgnv2i6yswjfgybv5qb4m-bootstrap.sh.tmpl\", { TF_NIXOS_LUSTRATE = \"false\", TF_NIX_INSTALL_URL = \"https://github.com/numtide/nix-unstable-installer/releases/download/nix-2.5pre20211026_5667822/install\", TF_USERNAME = \"cole\" })}"},"max_bid_price":"0.50","metro":"sv","project_id":"afc67974-ff22-41fd-9346-5b2c8d51e3a9","wait_for_devices":true}},"oci_core_default_route_table":{"default_route_table":[{"display_name":"DefaultRouteTable","manage_default_resource_id":"${oci_core_vcn.default_vcn.default_route_table_id}","route_rules":[{"destination":"0.0.0.0/0","destination_type":"CIDR_BLOCK","network_entity_id":"${oci_core_internet_gateway.default_internet_gateway.id}"}]}]},"oci_core_internet_gateway":{"default_internet_gateway":[{"compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","display_name":"DefaultInternetGateway","vcn_id":"${oci_core_vcn.default_vcn.id}"}]},"oci_core_subnet":{"default_subnet":[{"availability_domain":"${data.oci_identity_availability_domain.default_ad.name}","cidr_block":"10.0.1.0/24","compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","dhcp_options_id":"${oci_core_vcn.default_vcn.default_dhcp_options_id}","display_name":"DefaultSubnet","dns_label":"default","route_table_id":"${oci_core_vcn.default_vcn.default_route_table_id}","security_list_ids":["${oci_core_vcn.default_vcn.default_security_list_id}"],"vcn_id":"${oci_core_vcn.default_vcn.id}"}]},"oci_core_vcn":{"default_vcn":[{"cidr_block":"10.0.0.0/16","compartment_id":"ocid1.compartment.oc1..aaaaaaaafclyuqguzm2rtz5a5kcijxnjnidd4x3u35rwlivim6xuwuutzsta","display_name":"DefaultVcn","dns_label":"default"}]}},"terraform":{"required_providers":{"metal":{"source":"equinix/metal","version":"3.2.0"}}}}
│ 
╵
+ tixe
displague commented 2 years ago

It looks like the wait_for_devices is triggering a device fetch when the device id is not available: https://github.com/equinix/terraform-provider-metal/blob/ec6ec6f5daa5161cd2650e4b334bbc16f9653427/metal/resource_metal_spot_market_request.go#L429-L431

colemickens commented 2 years ago

I forgot, I actually have the logging infra in place from previous issues, I'll attach it. log.txt

t0mk commented 2 years ago

@colemickens thanks for supplyign the debug log and great that you redacted your API token!

There is a GET for a spot mark req:

2021-11-10T13:06:34.597-0800 [INFO]  provider.terraform-provider-metal_v3.2.0: 2021/11/10 13:06:34 [DEBUG] Equinix Metal API Request Details:
---[ REQUEST ]---------------------------------------
GET /metal/v1/spot-market-requests/4c603e72-e6cd-4c78-837e-5e69b88c7665?include=project%2Cdevices%2Cfacilities%2Cmetro HTTP/1.1
...

and the reply is

2021-11-10T13:06:35.227-0800 [INFO]  provider.terraform-provider-metal_v3.2.0: 2021/11/10 13:06:35 [DEBUG] Equinix Metal API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Content-Length: 3218
Cache-Control: max-age=0, private, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Wed, 10 Nov 2021 21:06:35 GMT
Etag: W/"596f0204f08af0c34122fa6053f512a5"
Last-Modified: Wed, 10 Nov 2021 21:06:31 GMT
Strict-Transport-Security: max-age=15724800; includeSubDomains
X-Request-Id: d34bf9012c5cae0460084953175c7eb1

{
 "id": "4c603e72-e6cd-4c78-837e-5e69b88c7665",
 "created_at": "2021-11-10T21:06:31Z",
 "devices_min": 1,
 "devices_max": 1,
 "max_bid_price": 0.5,
 [...],
 "devices": [
  {}                       <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Not great
 ], 
 "href": "/metal/v1/spot-market-requests/4c603e72-e6cd-4c78-837e-5e69b88c7665"
}

.. there is an empty dict in the devices array. It's and API bug and packngo or the provider code are not ready for this. @displague will you please bring this up to the API devs?

If we assume that the empty device dict is just a temporary nuisance, and the device will appear, then this crash could be fixed in the SMR waiting code. we could assume that id the ID is empty, we need to wait more. @displague should I implement this?

displague commented 2 years ago

@t0mk I'm raising this to the API team. I don't think it is reasonable for packngo to try to work around this (at least until the nature of this bug is determined).

colemickens commented 2 years ago

Now I got a 500? But it's weird, the log shows it happening and the deployment continuing? But the 500 didn't pop up until the end when terraform gave up?

Log here: log.txt

colemickens commented 2 years ago

And then I tried again to finish my plan by deploying the second spot market request and now it immediately throws back 500 and fails.

:(