ansible-collections / amazon.aws

Ansible Collection for Amazon AWS
GNU General Public License v3.0
309 stars 341 forks source link

ec2_vpc_nat_gateway using a dynamically-allocated eIP sometimes fails with botocore exception InvalidElasticIpID.NotFound #1872

Open pluto00987 opened 12 months ago

pluto00987 commented 12 months ago

Summary

Creating a NAT gateway with ec2_vpc_nat_gateway using a dynamically-allocated eIP sometimes fails with a botocore exception InvalidElasticIpID.NotFound. This is despite the fact that the eIPallocation it references (eipalloc-0faae3f7d465f76f9 as per the example traceback below) does exist, at least after the fact, and also that no eIP is provided by the yaml so it is creating that eIP itself (as expected).

It's unclear to me why this happens, ie if it's a collection issue or a boto issue. I don't see any 'state' or similar attribute on an eIP that would suggest it might not be 'ready' as soon as it 'exists'. As such I'm not sure if/how the collection could check for that in between eIP creation and NATgw creation.

This is with aws collection 6.2.0, but I don't see any changes to ec2_vpc_nat_gateway.py in newer versions of 6.x

Issue Type

Bug Report

Component Name

ec2_vpc_nat_gateway

Ansible Version

$ ansible --version
ansible [core 2.14.3]
  config file = /runner/project/ansible.cfg
  configured module search path = ['/home/runner/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /runner/requirements_collections:/home/runner/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.16 (main, Dec  8 2022, 00:00:00) [GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Collection Versions

$ ansible-galaxy collection list

# /usr/share/ansible/collections/ansible_collections
Collection              Version
----------------------- -------
@NAMESPACE@.@NAME@      3.0.1
amazon.aws              5.4.0
ansible.posix           1.5.1
ansible.windows         1.13.0
awx.awx                 21.13.0
azure.azcollection      1.15.0
community.vmware        *
google.cloud            1.1.3
kubernetes.core         2.4.0
openstack.cloud         2.0.0
redhatinsights.insights 1.0.7
theforeman.foreman      3.9.0

# /runner/requirements_collections/ansible_collections
Collection         Version
------------------ -------
amazon.aws         6.2.0
ansible.netcommon  3.1.0
ansible.utils      2.11.0
ansible.windows    1.11.1
awx.awx            19.2.2
community.aws      6.1.0
community.docker   1.9.0
community.general  3.4.0
community.windows  1.11.0
oasis_roles.system 1.1.3

AWS SDK versions

$ pip show boto boto3 botocore
WARNING: Package(s) not found: boto
Name: boto3
Version: 1.26.99
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: botocore, jmespath, s3transfer
Required-by:
---
Name: botocore
Version: 1.29.99
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

Configuration

$ ansible-config dump --only-changed
ANSIBLE_FORCE_COLOR(env: ANSIBLE_FORCE_COLOR) = True
ANSIBLE_PIPELINING(/runner/project/ansible.cfg) = True
COLLECTIONS_PATHS(env: ANSIBLE_COLLECTIONS_PATHS) = ['/runner/requirements_collections', '/home/runner/.ansible/collections', >
CONFIG_FILE() = /runner/project/ansible.cfg
DEFAULT_CALLBACK_PLUGIN_PATH(env: ANSIBLE_CALLBACK_PLUGINS) = ['/runner/artifacts/2081/callback']
DEFAULT_ROLES_PATH(env: ANSIBLE_ROLES_PATH) = ['/runner/requirements_roles', '/home/runner/.ansible/roles', '/usr/share/ansibl>
DEFAULT_STDOUT_CALLBACK(env: ANSIBLE_STDOUT_CALLBACK) = awx_display
HOST_KEY_CHECKING(env: ANSIBLE_HOST_KEY_CHECKING) = False
INVENTORY_UNPARSED_IS_FAILED(env: ANSIBLE_INVENTORY_UNPARSED_FAILED) = True
RETRY_FILES_ENABLED(env: ANSIBLE_RETRY_FILES_ENABLED) = False

OS / Environment

CentOS Stream release 9

Steps to Reproduce

- name: Ensure the VPC has NAT gateway for agent subnets
  amazon.aws.ec2_vpc_nat_gateway:
    if_exist_do_not_create: yes
    region: "{{ region }}"
    subnet_id: "{{ subnet_id }}"
    wait: yes
  register: natgw
  when: agent_nat

Expected Results

This should create a new public NAT gateway, using a freshly-allocated Elastic IP.

Actual Results

"An error occurred (InvalidElasticIpID.NotFound) when calling the CreateNatGateway operation: The elasticIp ID 'eipalloc-0faae3f7d465f76f9' does not exist"

"Traceback (most recent call last):
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/modules/ec2_vpc_nat_gateway.py\", line 630, in create
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/retries.py\", line 105, in deciding_wrapper
    return retrying_wrapper(*args, **kwargs)
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py\", line 119, in _retry_wrapper
    return _retry_func(
  File \"/tmp/ansible_amazon.aws.ec2_vpc_nat_gateway_payload_031y5umw/ansible_amazon.aws.ec2_vpc_nat_gateway_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py\", line 68, in _retry_func
    return func()
  File \"/usr/local/lib/python3.9/site-packages/botocore/client.py\", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File \"/usr/local/lib/python3.9/site-packages/botocore/client.py\", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidElasticIpID.NotFound) when calling the CreateNatGateway operation: The elasticIp ID 'eipalloc-0f0e36392ebfc5490' does not exist
",

Code of Conduct

pluto00987 commented 12 months ago

On the surface this seems similar to https://github.com/ansible-collections/amazon.aws/pull/1320 but I don't think it's quite the same issue.

tremble commented 12 months ago

The most likely cause is the AWS APIs being "eventually" consistent (the same as #1320). Sometimes the API calls will return things like the ID for a net-new resource before they can be consistently referenced.

updating the client creation call to something like the following will probably ~fix~ work around the issue:

retry_decorator = AWSRetry.jittered_backoff(
    catch_extra_error_codes=["InvalidElasticIpID.NotFound"],
)
client = module.client("ec2", retry_decorator=retry_decorator)

Would you be willing to open a PR?