Open ctreatma opened 3 months ago
~I've hit a wall debugging the 422 error further. The metal-python
SDK will print HTTP requests and responses to stdout when configuration.debug = True
, but Ansible makes it difficult to get at the module's stdout and I've had no luck so far figuring out how to wire that up.~
The 422 error is due to a lack of platform capacity and is an outcome rather than a cause of the duplicate servers.
NOTE: This issue was mitigated in v0.6.2+ by increasing the wait timeout in metal_device
to 30 minutes. The increased timeout makes it less likely to encounter this behavior in the wild, but there is still the potential for it to happen until we come up with a direct fix.
Upon further inspection, the issue is that the Ansible collection filters by metro when looking up devices; metal-cli does not appear to support a metro filter on devices, so it cannot reproduce this issue.
I observed the following behavior when a server is in queued state:
curl -H "X-Auth-Token: ${METAL_AUTH_TOKEN}" https://api.equinix.com/metal/v1/projects/<project_id>/devices?hostname=<hostname>&page=1
(filtering only by hostname) returns a list with one itemcurl -H "X-Auth-Token: ${METAL_AUTH_TOKEN}" https://api.equinix.com/metal/v1/projects/<project_id>/devices?metro=<metro>&hostname=<hostname>&page=1
(filtering by both metro and hostname) returns an empty listOnce the server comes out of queued state, both curl commands return the matching server.
SUMMARY
When given a
hostname
andproject_id
, the metal_device module attempts to find a server with that hostname in the specified Equinix Metal project, and creates a server if it can't find an existing match. It appears that, if the server is early enough in the provisioning process, the module is unable to find the existing server and it submits another request to create an identical server instead of monitoring the existing request.I've observed this issue when a server is in
queued
state, but it is possible the issue exists for other states as well. I confirmed that the server is visible in the API response by runningmetal devices get -p <project_id>
; since that CLI command hits the same endpoint that the metal_device module uses, this appears to be a problem in the Ansible collection and not in the Equinix Metal API.ISSUE TYPE
COMPONENT NAME
equinix.cloud.metal_device
ANSIBLE VERSION
CONFIGURATION
OS / ENVIRONMENT
N/A
STEPS TO REPRODUCE
The config below uses an extremely short timeout to guarantee that the module will fail before the device is provisioned. Run the config using
ansible-playbook <path/to/file.yaml>
. You may need to run the command multiple times; since the hostname is changing you should end up with 2 servers no matter how many times you run the config, but if you look in the Equinix Metal console you will see more than 2 servers.You can put the following contents in
group_vars/all.yml
to ensure the necessary variables are defined for the above config:EXPECTED RESULTS
The config above should create exactly 2 servers.
ACTUAL RESULTS
More than 2 servers are created, and each hostname is duplicated multiple times.
I've also observed the following error in the Ansible output: