Problem with teardown when you have more than 200 instances

IPvSean commented 2 years ago

Summary

Is there an example role, or documentation for tearing down more than 200+ instances with ec2_instance? We are hitting scalability issues with this module...

This PR https://github.com/ansible/workshops/pull/1589/files. Will break it up into chunks... so each student (who has 4 instances) will be de-provisoined separately, but this is extremely slow. We need a way to kill all instances with a certain tag regardless of how many there are.

Issue Type

Documentation Report

Component Name

ec2_instance

Ansible Version

➜  ~ ansible --version
ansible [core 2.11.2]
  config file = /Users/sean/.ansible.cfg
  configured module search path = ['/Users/sean/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /Users/sean/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.9 (main, Nov 21 2021, 03:23:42) [Clang 13.0.0 (clang-1300.0.29.3)]
  jinja version = 3.0.1
  libyaml = True

Collection Versions

$ ansible-galaxy collection list

Configuration

# /Users/sean/.ansible/collections/ansible_collections
Collection                          Version
----------------------------------- -------
amazon.aws                          3.1.1
ansible.netcommon                   2.0.2
ansible.network                     1.2.0
ansible.posix                       1.2.0
ansible.product_demos               1.2.12
ansible.utils                       2.5.0
ansible.windows                     1.5.0
ansible.workshops                   1.0.11
arista.eos                          2.1.2
awx.awx                             19.4.0
chocolatey.chocolatey               1.1.0
cisco.ios                           2.0.1
cisco.iosxr                         2.8.1
cisco.nxos                          2.9.0
community.aws                       1.5.0
community.crypto                    1.6.2
community.general                   3.0.2
community.mysql                     2.1.0
community.windows                   1.3.0
containers.podman                   1.9.1
f5networks.f5_modules               1.9.0
frr.frr                             1.0.3
junipernetworks.junos               2.1.0
openvswitch.openvswitch             2.1.0
redhat_cop.controller_configuration 2.1.1
redhat_cop.tower_utilities          2.0.1
vyos.vyos                           2.8.0

OS / Environment

RHEL8 and/or MacOS

Additional Information

I am hoping this is a doc issue.... or example issue... and not a problem with the module itself?

Code of Conduct

[X] I agree to follow the Ansible Code of Conduct

ansibullbot commented 2 years ago

Files identified in the description:

[plugins/modules/ec2_instance.py](https://github.com/['ansible-collections/amazon.aws', 'ansible-collections/community.aws', 'ansible-collections/community.vmware']/blob/main/plugins/modules/ec2_instance.py)

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 2 years ago

cc @jillr @ryansb @s-hertel @tremble click here for bot help

tremble commented 2 years ago

I don't have any docs to point you to. For CI there's a custom lambda that runs and deletes things (https://github.com/mattclay/aws-terminator)

If you want an Ansible specific option what I'd probably do, rather than feeding the output of ec2_instance_info directly into ec2_instance, is use the output of ec2_instance_info to generate a hostgroup with 200+ members, you can then run a deletion play against that hostgroup that runs as many copies as you want in parallel. Additionally you'll probably want to set wait: false.

You'll need to be careful with just how many threads you run in parallel, the AWS APIs have ratelimits.

heatmiser commented 2 years ago

The issue is due to the underlying boto3 library: https://github.com/boto/boto3/issues/1099 ...and was previously marked as a feature request, however, the request became stale and removed. A comment in the aforementioned issue was... We typically don't do automatic batch handling unless explicitly called out. The only cases I'm aware of are the dynamodb batch get/write operations. For now you'll have to manually handle that. The boto3 library provides a means for specified list size batching on the returned list: https://github.com/boto/boto3/blob/5f9c6cb2f24a59fe1958c9fbedcc2bf821f2a88a/boto3/resources/collection.py#L251 ...which would permit the means to limit calls to LTE 200 objects (instances) and loop until no more objects (instances) are present and avoid the error.

Would it be better to make another feature request to the boto3 library project and again request automatic batching when the filtered object list is GT 200 items? Or should a feature request be made for this project to automatically detect and handle batching boto3 requests when an amazon.aws Ansible module is utilized that would require acting on more than 200 objects for a single task?

ansible-collections / amazon.aws