Open sjthespian opened 2 years ago
Files identified in the description: None
If these files are inaccurate, please update the component name
section of the description or use the !component
bot command.
Files identified in the description:
plugins/modules/ec2_instance.py
](https://github.com/['ansible-collections/amazon.aws', 'ansible-collections/community.aws', 'ansible-collections/community.vmware']/blob/main/plugins/modules/ec2_instance.py)If these files are inaccurate, please update the component name
section of the description or use the !component
bot command.
cc @jillr @ryansb @s-hertel @tremble click here for bot help
@sjthespian Thanks for taking the time to open this issue.
ec2_instance was promoted to the "amazon.aws" collection (different support policies), so I've moved the issue over there. Version 3.x of this collection is nearing the end of it's support life (we're starting to prepare for 5.0), and 2.x is no longer supported by the community.
Please could you try to reproduce this issue using a more recent release of this collection (4.2.0 is the latest release). There's been some significant work around handling state since 3.x which wasn't all backported and may fix your issue.
Let me run some testing on the amazon.aws version next week. This is the first time I have seen this issue, it never showed up in my testing of the module in our dev environment, so this could be tough to reproduce.
So far things are looking good using amazon.aws.ec2_instance. I don't know if the original problem was a race condition in the cluster or switching actually fixed things; but in either case I'm going to go ahead and close this. I don't have a cluster I can play with to keep seeing if I can reproduce the issue unfortunately.
Thanks for the help!
Reopening this -- I just had the same failure using amazon.aws.ec2_instance.
# Make sure gold masters are stopped
- name: Stop gold masters
amazon.aws.ec2_instance:
state: stopped
wait: true
instance_ids: "{{ gold_master_instances }}"
region: us-east-1
profile: "{{ aws_profile }}"
tags:
- sysprep
Which sometimes fails with:
{
"stop_success": [
"i-08b7xxxxxxxx01e4",
"i-0604xxxxxxxxf9f6"
],
"stop_failed": [
"i-074bxxxxxxxx3a79"
],
"msg": "Unable to stop instances: ",
"invocation": {
"module_args": {
"state": "stopped",
"wait": true,
"instance_ids": [
"i-08b7xxxxxxxx01e4",
"i-0604xxxxxxxxf9f6",
"i-074bxxxxxxxx3a79"
],
"region": "us-east-1",
"profile": "xxx-xxx-xxx",
"debug_botocore_endpoint_logs": false,
"validate_certs": true,
"wait_timeout": 600,
"security_groups": [],
"purge_tags": false,
"ec2_url": null,
"aws_access_key": null,
"aws_secret_key": null,
"security_token": null,
"aws_ca_bundle": null,
"aws_config": null,
"count": null,
"exact_count": null,
"image": null,
"image_id": null,
"instance_type": null,
"user_data": null,
"tower_callback": null,
"ebs_optimized": null,
"vpc_subnet_id": null,
"availability_zone": null,
"security_group": null,
"instance_role": null,
"name": null,
"tags": null,
"filters": {
"instance-state-name": [
"pending",
"running",
"stopping",
"stopped"
],
"instance-id": [
"i-08b7xxxxxxxx01e4",
"i-0604xxxxxxxxf9f6",
"i-074bxxxxxxxx3a79"
]
},
"launch_template": null,
"key_name": null,
"cpu_credit_specification": null,
"cpu_options": null,
"tenancy": null,
"placement_group": null,
"instance_initiated_shutdown_behavior": null,
"termination_protection": null,
"detailed_monitoring": null,
"network": null,
"volumes": null,
"metadata_options": null
}
},
"deprecations": [
{
"msg": "Default value instance_type has been deprecated, in the future you must set an instance_type or a launch_template",
"date": "2023-01-01",
"collection_name": "amazon.aws"
}
],
"_ansible_no_log": false,
"changed": false
}
I believe what is happening is that something earlier in the playbook tells the instances to shut down from the OS. When it gets to this point in the playbook the instances are either already down, or in the process of shutting down. However, since I am using wait: true
, I would expect tit to just wait for them all to stop rather than failing with the above error.
Summary
Using community.aws.ec2_instance to stop instances usually works, but tonight it threw the message "Unable to stop instances:" for two out of three of the instances I was attempting to stop. The playbook task is fairly simple:
It is possible that the instances that this failed on were in the process of being shut down by the operating system (Windows), but I would still expect the above task to wait for those instances, not fail. Checking the instances shortly after this failure in the AWS EC2 console showed that the instances were already stopped.
Below is the full text of the error with instance names and other identifying information redacted.
Issue Type
Bug Report
Component Name
ec2_instance
Ansible Version
Collection Versions
AWS SDK versions
Configuration
OS / Environment
Running in a docker image built on Debian GNU/Linux 10 (buster)
Steps to Reproduce
Expected Results
I would expect the instances in the
gold_master_instances
list to be stopped after this task runs. If they are already stopped, I would expect it to exit quickly, otherwise I would expect it to wait for them to stop.This has worked in the past, tonight is the first time I have seen this failure.
Actual Results
Code of Conduct