ansible / ansible-modules-core

Ansible modules - these modules ship with ansible
1.3k stars 1.95k forks source link

cloudformation module on Ansible 2.2.0 throws "PhysicalResourceId" error intermittently. #5460

Closed alanroche closed 7 years ago

alanroche commented 7 years ago
ISSUE TYPE
COMPONENT NAME

cloudformation

ANSIBLE VERSION
ansible 2.2.0.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
CONFIGURATION
OS / ENVIRONMENT

Ubuntu 14.04

SUMMARY

cloudformation module on Ansible 2.2.0 throws "PhysicalResourceId" error intermittently.

STEPS TO REPRODUCE

Push a stack with Ansible cloudformation module on Ansible 2.2.0

EXPECTED RESULTS
ACTUAL RESULTS
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'PhysicalResourceId'
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_Kx3c0c/ansible_module_cloudformation.py\", line 483, in <module>\n    main()\n  File \"/tmp/ansible_Kx3c0c/ansible_module_cloudformation.py\", line 450, in main\n    \"physical_resource_id\": res['PhysicalResourceId'],\nKeyError: 'PhysicalResourceId'\n", "module_stdout": "", "msg": "MODULE FAILURE"}
    to retry, use: --limit 
ansibot commented 7 years ago

@tedder, @ryansb, ping. This issue is waiting on your response. click here for bot help

tedder commented 7 years ago

What's your boto version where this is running? For what it's worth, 2.3 will have a completely rewritten module using boto3.

alanroche commented 7 years ago

Now that you mention it, we did upgrade boto to a newer version recently...

$ pip show boto

Name: boto Version: 2.43.0 Location: /usr/local/lib/python2.7/dist-packages Requires: $

alanroche commented 7 years ago

FYI We also have boto3 installed side by side

$ pip show boto3

Name: boto3 Version: 1.4.0 Location: /usr/local/lib/python2.7/dist-packages Requires: jmespath, botocore, s3transfer $

alanroche commented 7 years ago

and botocore....

$ pip show botocore

Name: botocore Version: 1.4.26 Location: /usr/local/lib/python2.7/dist-packages Requires: docutils, jmespath, python-dateutil $

alanroche commented 7 years ago

If something has changed in boto2. - we can pin the boto2 version as a workaround until fixed

tedder commented 7 years ago

Can you give a self-contained example that triggers this?

alanroche commented 7 years ago

Unfortunately I cannot really, as it is intermittent. Sometimes the same code works other times not. What is happening is pretty simple, - it is just pushing a new cloudformation stack. I have however managed to repro this with verbose trace enabled if that is of any help.

ASK [cf-api-gateway : Provision Stack] **** task path: /var/lib/jenkins/workspace/[REDACTED]/aws/ansible/roles/cf-api-gateway/tasks/main.yml:14 Using module file /usr/local/lib/python2.7/dist-packages/ansible/modules/core/cloud/amazon/cloudformation.py

ESTABLISH LOCAL CONNECTION FOR USER: jenkins EXEC /bin/sh -c '( umask 77 && mkdir -p "`echo $HOME/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655`" && echo ansible-tmp-1478264178.82-121111996869655="`echo $HOME/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655`" ) && sleep 0' PUT /tmp/tmp4Ywzvm TO /var/lib/jenkins/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655/cloudformation.py EXEC /bin/sh -c 'chmod u+x /var/lib/jenkins/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655/ /var/lib/jenkins/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655/cloudformation.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python /var/lib/jenkins/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655/cloudformation.py; rm -rf "/var/lib/jenkins/.ansible/tmp/ansible-tmp-1478264178.82-121111996869655/" > /dev/null 2>&1 && sleep 0' An exception occurred during task execution. The full traceback is: Traceback (most recent call last): File "/tmp/ansible_2ypQgd/ansible_module_cloudformation.py", line 483, in main() File "/tmp/ansible_2ypQgd/ansible_module_cloudformation.py", line 450, in main "physical_resource_id": res['PhysicalResourceId'], KeyError: 'PhysicalResourceId' fatal: [localhost]: FAILED! => { "changed": false, "failed": true, "invocation": { "module_name": "cloudformation" }, "module_stderr": "Traceback (most recent call last):\n File \"/tmp/ansible_2ypQgd/ansible_module_cloudformation.py\", line 483, in \n main()\n File \"/tmp/ansible_2ypQgd/ansible_module_cloudformation.py\", line 450, in main\n \"physical_resource_id\": res['PhysicalResourceId'],\nKeyError: 'PhysicalResourceId'\n", "module_stdout": "", "msg": "MODULE FAILURE" } to retry, use: --limit @/var/lib/jenkins/workspace/[REDACTED]/aws/ansible/push.retry
alanroche commented 7 years ago

I can confirm now that this exhibits on boto 2.41.0 also, which we had been using previously and did not observe errors on this version of boto prior to updating to Ansible 2.2.

PS. We need to use Ansible 2.2 as it seems to fix some errors with SSHing through AWS bastion servers: https://github.com/ansible/ansible/issues/17349

ryansb commented 7 years ago

Hm, I wasn't able to repro this yesterday. Going to try a different way today, but if you're able to provide any info about your playbook that'd help. Does it have a particular resource type? Do you use outputs?

alanroche commented 7 years ago

OK,

I have some useful info now (I think) :)

The stack is failing to push, It was not obvious because it is a test stack that gets dropped after tests get run. Sorry about that, - but because the stack was getting dropped, it appeared on the surface as an intermittent error. When I go into the CloudFormation console and query back on "Deleted Stacks" I can see clearly that the offending stack is ERRORED (In our case simply due to a CloudWatch logs limit being reached, but that is besides the point).

So, - I am guessing/thinking perhaps that: 1/ This is a new stack being pushed (ie. NOT an update to existing stack) 2/ The pusgh errored before the physical_resource_id was assigned in Cloudformation, - but maybe the ansible code assumes somewhere that a physical resource ID will come back, - but there is none due to the stack push being errored.

Hope this helps...

And YES - we do have output variables on this stack also, if that matters. The resource being pushed in this case was an AWS API Gateway instance.

tedder commented 7 years ago

Okay. It does seem like an edge case. I'm curious if you can repro in 2.3, as everything is different. Until then, marking this wontfix.

ssummer3 commented 7 years ago

I can replicate this error with a stack to create an EC2 Instance that has DisableApiTermination: true and a role that does not have permission to complete the stack.

The stack goes into ROLLBACK_FAILED due to not being able to terminate the instance, and KeyError: 'PhysicalResourceId' occurs.

tedder commented 7 years ago

Ryan, please attempt to repro on the 2.3/devel version of the module.

ryansb commented 7 years ago

I reproduced on latest 2.3, here's how I did it:

  1. Create an IAM role that has read permissions, and ability to create an S3 bucket (policy below)
  2. Create a template that includes an S3 bucket and another resource that the IAM role has no permissions for (I used SNS topic)
  3. Use the module to create the stack, using the IAM role specified above

Template:

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Parameters": {},
    "Resources": {
        "Bukkit": {
            "Type": "AWS::S3::Bucket",
            "Properties": {}
        },
        "other": {
            "Type": "AWS::SNS::Topic",
            "Properties": {}
        }
    }
}

Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1478712625000",
            "Effect": "Allow",
            "Action": ["s3:CreateBucket"],
            "Resource": "*"
        }
    ]
}

Playbook:

---
- hosts: localhost
  tasks:
    - register: stack_out
      cloudformation:
        profile: slscode
        region: us-west-2
        stack_name: testStack
        template: ./cloudformation/simple-stack.json
        role_arn: arn:aws:iam::509803855674:role/bukkitcreate

Then, when run, here's the failure:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: KeyError: 'PhysicalResourceId'
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ryansb
/ansible_J1ty5m/ansible_module_cloudformation.py\", line 504, in <module>\n    main()\n  File \"/tmp/ryansb/ansible_J1ty5m/ansible_module_cl
oudformation.py\", line 467, in main\n    \"physical_resource_id\": res['PhysicalResourceId'],\nKeyError: 'PhysicalResourceId'\n", "module_s
tdout": "", "msg": "MODULE FAILURE"}

I'm working on a fix, I'll let you know when I get one.

ryansb commented 7 years ago

in_progress

tedder commented 7 years ago

Thanks Ryan and Alan.

ryansb commented 7 years ago

@alanroche and @ssummer3 you can reopen this issue if the fix doesn't work for you.

scshitole commented 7 years ago

Hello folks, I am getting this error when tried deploying the CFT

root@mg-server1 playbooks]# ansible-playbook f5_at_aws.yaml

PLAY [creating HTTPS application] **

TASK [setup] *** ok: [localhost]

TASK [launch ansible cloudformation example] *** fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to localhost closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_SKaJwW/ansible_module_cloudformation.py\", line 153, in \r\n import boto.cloudformation.connection\r\n File \"/usr/lib/python2.7/site-packages/boto/cloudformation/init.py\", line 27, in \r\n RegionData = load_regions().get('cloudformation')\r\n File \"/usr/lib/python2.7/site-packages/boto/regioninfo.py\", line 112, in load_regions\r\n additional = load_endpoint_json(additional_path)\r\n File \"/usr/lib/python2.7/site-packages/boto/regioninfo.py\", line 44, in load_endpoint_json\r\n return _load_json_file(path)\r\n File \"/usr/lib/python2.7/site-packages/boto/regioninfo.py\", line 56, in _load_json_file\r\n with open(path, 'r') as endpoints_file:\r\nIOError: [Errno 2] No such file or directory: '/path/to/my/boto/endpoints.json'\r\n", "msg": "MODULE FAILURE"} to retry, use: --limit @/etc/ansible/playbooks/f5_at_aws.retry

PLAY RECAP ***** localhost : ok=1 changed=0 unreachable=0 failed=1

[root@mg-server1 playbooks]# [root@mg-server1 playbooks]# [root@mg-server1 playbooks]# [root@mg-server1 playbooks]# pip show boto Name: boto Version: 2.46.1 Summary: Amazon Web Services Library Home-page: https://github.com/boto/boto/ Author: Mitch Garnaat Author-email: mitch@garnaat.com License: MIT Location: /usr/lib/python2.7/site-packages Requires: [root@mg-server1 playbooks]# [root@mg-server1 playbooks]# [root@mg-server1 playbooks]# pip show boto3 Name: boto3 Version: 1.4.4 Summary: The AWS SDK for Python Home-page: https://github.com/boto/boto3 Author: Amazon Web Services Author-email: UNKNOWN License: Apache License 2.0 Location: /usr/lib/python2.7/site-packages Requires: botocore, jmespath, s3transfer [root@mg-server1 playbooks]# [root@mg-server1 playbooks]# pip show botocore Name: botocore Version: 1.5.37 Summary: Low-level, data-driven core of boto 3. Home-page: https://github.com/boto/botocore Author: Amazon Web Services Author-email: UNKNOWN License: Apache License 2.0 Location: /root/.local/lib/python2.7/site-packages Requires: python-dateutil, jmespath, docutils [root@mg-server1 playbooks]#

[root@mg-server1 playbooks]# cat f5_at_aws.yaml

tedder commented 7 years ago

It's hard to read the unformatted stuff, but I don't see it specified- what version of Ansible, @scshitole ?

scshitole commented 7 years ago

[root@mg-server1 playbooks]# ansible --version ansible 2.2.0.0 config file = /etc/ansible/ansible.cfg configured module search path = ['/usr/share/my_modules/']

tedder commented 7 years ago

cloudformation has been substantially rewritten in 2.3 and I/we don't have plans to fix uncommon problems on 2.2. Please try 2.3 and open a new issue if there's a problem. Also note it isn't related to this ticket's original issue anyhow.

In fact, I think your issue is probably just from copying/pasting some boto config that is incorrect.

wontfix