ansible-collections / community.aws

Ansible Collection for Community AWS
GNU General Public License v3.0
189 stars 398 forks source link

s3_lifecycle is not idempotent - does write action for no change #1624

Closed mdavis-xyz closed 1 year ago

mdavis-xyz commented 1 year ago

Summary

When s3_lifecycle is run and there are no changes to make, it still calls put_bucket_lifecycle_configuration.

My use case is that I am running a playbook multiple times concurrently, for a lifecycle configuration which is not changing. And I'm getting errors because of concurrency clashes. If I'm not changing the lifecycle, I expect only read-only calls to S3, which shouldn't clash.

This module should get the existing lifecycle config, compare it to what we want, and only if it differs, put the new lifecycle.

Issue Type

Bug Report

Component Name

s3_lifecycle

Ansible Version

$ ansible --version
ansible [core 2.13.6]
  config file = /home/ec2-user/.ansible.cfg
  configured module search path = ['/home/ec2-user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/ansible
  ansible collection location = /home/ec2-user/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/ec2-user/.pyenv/versions/3.8.11/bin/ansible
  python version = 3.8.11 (default, Sep  7 2022, 04:17:12) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)]
  jinja version = 3.1.2
  libyaml = True

Collection Versions

# /home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    3.5.0  
ansible.netcommon             3.1.3  
ansible.posix                 1.4.0  
ansible.utils                 2.7.0  
ansible.windows               1.12.0 
arista.eos                    5.0.1  
awx.awx                       21.8.0 
azure.azcollection            1.14.0 
check_point.mgmt              2.3.0  
chocolatey.chocolatey         1.3.1  
cisco.aci                     2.3.0  
cisco.asa                     3.1.0  
cisco.dnac                    6.6.0  
cisco.intersight              1.0.20 
cisco.ios                     3.3.2  
cisco.iosxr                   3.3.1  
cisco.ise                     2.5.8  
cisco.meraki                  2.11.0 
cisco.mso                     2.1.0  
cisco.nso                     1.0.3  
cisco.nxos                    3.2.0  
cisco.ucs                     1.8.0  
cloud.common                  2.1.2  
cloudscale_ch.cloud           2.2.2  
community.aws                 3.6.0  
community.azure               1.1.0  
community.ciscosmb            1.0.5  
community.crypto              2.8.1  
community.digitalocean        1.22.0 
community.dns                 2.4.0  
community.docker              2.7.1  
community.fortios             1.0.0  
community.general             5.8.0  
community.google              1.0.0  
community.grafana             1.5.3  
community.hashi_vault         3.4.0  
community.hrobot              1.6.0  
community.libvirt             1.2.0  
community.mongodb             1.4.2  
community.mysql               3.5.1  
community.network             4.0.1  
community.okd                 2.2.0  
community.postgresql          2.3.0  
community.proxysql            1.4.0  
community.rabbitmq            1.2.3  
community.routeros            2.3.1  
community.sap                 1.0.0  
community.sap_libs            1.3.0  
community.skydive             1.0.0  
community.sops                1.4.1  
community.vmware              2.10.1 
community.windows             1.11.1 
community.zabbix              1.8.0  
containers.podman             1.9.4  
cyberark.conjur               1.2.0  
cyberark.pas                  1.0.14 
dellemc.enterprise_sonic      1.1.2  
dellemc.openmanage            5.5.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
f5networks.f5_modules         1.20.0 
fortinet.fortimanager         2.1.6  
fortinet.fortios              2.1.7  
frr.frr                       2.0.0  
gluster.gluster               1.0.2  
google.cloud                  1.0.2  
hetzner.hcloud                1.8.2  
hpe.nimble                    1.1.4  
ibm.qradar                    2.1.0  
ibm.spectrum_virtualize       1.10.0 
infinidat.infinibox           1.3.7  
infoblox.nios_modules         1.4.0  
inspur.ispim                  1.2.0  
inspur.sm                     2.3.0  
junipernetworks.junos         3.1.0  
kubernetes.core               2.3.2  
lowlydba.sqlserver            1.0.4  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.21.0
netapp.elementsw              21.7.0 
netapp.ontap                  21.24.1
netapp.storagegrid            21.11.1
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.3.1  
netbox.netbox                 3.8.1  
ngine_io.cloudstack           2.2.4  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.2  
openstack.cloud               1.10.0 
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   2.3.1  
purestorage.flasharray        1.14.0 
purestorage.flashblade        1.10.0 
purestorage.fusion            1.1.1  
sensu.sensu_go                1.13.1 
servicenow.servicenow         1.0.6  
splunk.es                     2.1.0  
t_systems_mms.icinga_director 1.31.4 
theforeman.foreman            3.7.0  
vmware.vmware_rest            2.2.0  
vultr.cloud                   1.3.0  
vyos.vyos                     3.0.1  
wti.remote                    1.0.4  

# /home/ec2-user/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
amazon.aws        5.1.0  
ansible.netcommon 4.1.0  
ansible.utils     2.8.0  
community.aws     5.0.0  
community.crypto  2.9.0  
community.general 6.0.1  

AWS SDK versions

$ pip show boto boto3 botocore

Configuration

$ ansible-config dump --only-changed
ANSIBLE_PIPELINING(/home/ec2-user/.ansible.cfg) = True
DEFAULT_LOCAL_TMP(/home/ec2-user/.ansible.cfg) = /dev/shm/ansible/tmp_local/ansible-local-24375r2_prrsj
DEFAULT_STDOUT_CALLBACK(/home/ec2-user/.ansible.cfg) = yaml
INTERPRETER_PYTHON(/home/ec2-user/.ansible.cfg) = /usr/bin/python3

OS / Environment

Amazon Linux 2

Steps to Reproduce

playbook.yaml

---
- hosts: myhosts
  connection: local
  gather_facts: no
  vars:
    bucket: mybucket
    region: ap-southeast-2
    rule_name: "my_rule"
  tasks:

    - name: create bucket
      run_once: true
      s3_bucket:
        state: present
        region: "{{ region }}"
        name: "{{ bucket }}"
        encryption: "AES256"
        tags:
          person: matt
          delete_after: "21/12/2022"

    - name: Add lifecycle config once
      run_once: true
      community.aws.s3_lifecycle:
        rule_id: "{{ rule_name }}"
        name: "{{ bucket }}"
        noncurrent_version_storage_class: standard_ia
        noncurrent_version_transition_days: 30 # minimum
        state: present
        status: enabled
        region: "{{ region }}"
        wait: True

    - name: Add lifecycle config many times
      run_once: False
      community.aws.s3_lifecycle:
        rule_id: "{{ rule_name }}"
        name: "{{ bucket }}"
        noncurrent_version_storage_class: standard_ia
        noncurrent_version_transition_days: 30 # minimum
        state: present
        status: enabled
        region: "{{ region }}"
        wait: True

hosts.yaml

myhosts:
  hosts:
      a: {}
      b: {}
      c: {}
      d: {}
      e: {}
      f: {}
      g: {}
      h: {}
      i: {}
      j: {}
      k: {}

Run with:

ansible-playbook playbook.yaml  -i hosts.yaml -e ansible_python_interpreter=$(which python3)

Expected Results

By the time we get to the last task, the bucket already has the lifecycle config we want. So the last tasks should also report success (no change), without throwing any errors. boto3 should only be used for read-only calls. No put call should be made by Ansible.

Actual Results

PLAY [myhosts] *****************************************************************************************

TASK [create bucket] ***********************************************************************************
changed: [a]

TASK [Add lifecycle config once] ***********************************************************************
changed: [a]

TASK [Add lifecycle config many times] *****************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.
fatal: [c]: FAILED! => changed=false 
  boto3_version: 1.24.82
  botocore_version: 1.27.82
  error:
    code: OperationAborted
    message: A conflicting conditional operation is currently in progress against this resource. Please try again.
  lifecycle_configuration:
    Rules:
    - Filter:
        Prefix: ''
      ID: my_rule
      NoncurrentVersionTransitions:
      - NoncurrentDays: 30
        StorageClass: STANDARD_IA
      Status: Enabled
  msg: 'An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.'
  name: mybucket
  old_lifecycle_rules:
  - Filter:
      Prefix: ''
    ID: my_rule
    NoncurrentVersionTransitions:
    - NoncurrentDays: 30
      StorageClass: STANDARD_IA
    Status: Enabled
  response_metadata:
    host_id: Lf1tcMXZfFFbYqA4HnEu/Dbii3iAFeMpWzkN2GJ9RN/7H/KiqSYCqvQZWKrYVCEQ3/oiuNJtuyeW3qbWsTuPBg==
    http_headers:
      content-length: '308'
      content-type: application/xml
      date: Wed, 21 Dec 2022 06:33:34 GMT
      server: AmazonS3
      x-amz-id-2: Lf1tcMXZfFFbYqA4HnEu/Dbii3iAFeMpWzkN2GJ9RN/7H/KiqSYCqvQZWKrYVCEQ3/oiuNJtuyeW3qbWsTuPBg==
      x-amz-request-id: X05KXWBXVB1FKJAY
    http_status_code: 409
    request_id: X05KXWBXVB1FKJAY
    retry_attempts: 0
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.
fatal: [e]: FAILED! => changed=false 
  boto3_version: 1.24.82
  botocore_version: 1.27.82
  error:
    code: OperationAborted
    message: A conflicting conditional operation is currently in progress against this resource. Please try again.
  lifecycle_configuration:
    Rules:
    - Filter:
        Prefix: ''
      ID: my_rule
      NoncurrentVersionTransitions:
      - NoncurrentDays: 30
        StorageClass: STANDARD_IA
      Status: Enabled
  msg: 'An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.'
  name: mybucket
  old_lifecycle_rules:
  - Filter:
      Prefix: ''
    ID: my_rule
    NoncurrentVersionTransitions:
    - NoncurrentDays: 30
      StorageClass: STANDARD_IA
    Status: Enabled
  response_metadata:
    host_id: 66BDcsa1gA2Sqn+HgKWnb0tst7Pp4KeRulVfOw0k41+El39THSbqbMC5qMuZaP3d8lV/2Od6ik/DBttggxai9g==
    http_headers:
      content-length: '308'
      content-type: application/xml
      date: Wed, 21 Dec 2022 06:33:34 GMT
      server: AmazonS3
      x-amz-id-2: 66BDcsa1gA2Sqn+HgKWnb0tst7Pp4KeRulVfOw0k41+El39THSbqbMC5qMuZaP3d8lV/2Od6ik/DBttggxai9g==
      x-amz-request-id: X05ZGKE17DWTPNHA
    http_status_code: 409
    request_id: X05ZGKE17DWTPNHA
    retry_attempts: 0
ok: [d]
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.
fatal: [a]: FAILED! => changed=false 
  boto3_version: 1.24.82
  botocore_version: 1.27.82
  error:
    code: OperationAborted
    message: A conflicting conditional operation is currently in progress against this resource. Please try again.
  lifecycle_configuration:
    Rules:
    - Filter:
        Prefix: ''
      ID: my_rule
      NoncurrentVersionTransitions:
      - NoncurrentDays: 30
        StorageClass: STANDARD_IA
      Status: Enabled
  msg: 'An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.'
  name: mybucket
  old_lifecycle_rules:
  - Filter:
      Prefix: ''
    ID: my_rule
    NoncurrentVersionTransitions:
    - NoncurrentDays: 30
      StorageClass: STANDARD_IA
    Status: Enabled
  response_metadata:
    host_id: jtEllYwZAVnS4V98eCIvffmBdiQajEMM6XgKTOrTYZ9wnfBk3C3yFa/QicPRTHmW+ljgLGdKMCqI5ExhvTId1w==
    http_headers:
      content-length: '308'
      content-type: application/xml
      date: Wed, 21 Dec 2022 06:33:34 GMT
      server: AmazonS3
      x-amz-id-2: jtEllYwZAVnS4V98eCIvffmBdiQajEMM6XgKTOrTYZ9wnfBk3C3yFa/QicPRTHmW+ljgLGdKMCqI5ExhvTId1w==
      x-amz-request-id: X05Q23R8Y5KEWD4P
    http_status_code: 409
    request_id: X05Q23R8Y5KEWD4P
    retry_attempts: 0
ok: [b]
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.
fatal: [f]: FAILED! => changed=false 
  boto3_version: 1.24.82
  botocore_version: 1.27.82
  error:
    code: OperationAborted
    message: A conflicting conditional operation is currently in progress against this resource. Please try again.
  lifecycle_configuration:
    Rules:
    - Filter:
        Prefix: ''
      ID: my_rule
      NoncurrentVersionTransitions:
      - NoncurrentDays: 30
        StorageClass: STANDARD_IA
      Status: Enabled
  msg: 'An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.'
  name: mybucket
  old_lifecycle_rules:
  - Filter:
      Prefix: ''
    ID: my_rule
    NoncurrentVersionTransitions:
    - NoncurrentDays: 30
      StorageClass: STANDARD_IA
    Status: Enabled
  response_metadata:
    host_id: JzIHlPYjCIlIa+o88FYvcEvFBKCQDdo75C0Mdwcr6ZQCHdP2hkEetTKdCqVe0m+fi2RcPMpXwqNN4JBTcoactQ==
    http_headers:
      content-length: '308'
      content-type: application/xml
      date: Wed, 21 Dec 2022 06:33:35 GMT
      server: AmazonS3
      x-amz-id-2: JzIHlPYjCIlIa+o88FYvcEvFBKCQDdo75C0Mdwcr6ZQCHdP2hkEetTKdCqVe0m+fi2RcPMpXwqNN4JBTcoactQ==
      x-amz-request-id: 3FV4FJPW4MPJZTW3
    http_status_code: 409
    request_id: 3FV4FJPW4MPJZTW3
    retry_attempts: 0
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: botocore.exceptions.ClientError: An error occurred (OperationAborted) when calling the PutBucketLifecycleConfiguration operation: A conflicting conditional operation is currently in progress against this resource. Please try again.
...

PLAY RECAP *********************************************************************************************
a                          : ok=2    changed=2    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
b                          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
c                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
d                          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
e                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
f                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
g                          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
h                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
i                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
j                          : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
k                          : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

i.e. some reported success, with no change. Others threw an error.

Code of Conduct

ansibullbot commented 1 year ago

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 1 year ago

cc @jillr @markuman @s-hertel @tremble click here for bot help

markuman commented 1 year ago

@mdavis-xyz Can you debug e.g. with q and see what the value of the changed variable is, two lines above? https://github.com/ansible-collections/community.aws/blob/bdb7c9f26f6ff39654cd90e2dd18605a6e3b026c/plugins/modules/s3_lifecycle.py#L467

import q
q(changed)

and cat /tmp/q after it is executed.

mdavis-xyz commented 1 year ago

I've never heard of the q library before. Is this how you're supposed to debug Ansible modules? I've always struggled to debug code changes I've written, because even print statements don't work. We should add this to the contribution docs for this collection, and Ansible in general.

Just to be clear, regardless of what compare_and_update_configuration does, put_bucket_lifecycle_configuration will always be called.

https://github.com/ansible-collections/community.aws/blob/bdb7c9f26f6ff39654cd90e2dd18605a6e3b026c/plugins/modules/s3_lifecycle.py#L467-L476

There's really two issues here.

  1. the module calls put when it shouldn't
  2. the module reports changed=False after calling put

For the second one, we can fix that with changed |= True in an else after that try.

For the first one, perhaps everything after compare_and_update_configuration is called should be inside an if changed?

I'll try the q thing later today. It will take me a while because figuring out how to run a clone of a module, without polluting my already-installed modules is not something that I find obvious nor easy.

mdavis-xyz commented 1 year ago

Ok I couldn't figure out how to run a playbook using a local clone of the module, without messing with my real global installation. (Are there docs for that somewhere? As an Ansible user I never need to touch galaxy or anything like that, because I only use the standard pre-installed collections.)

So I just created a whole new VM to test in, and modified the file in the globally installed collection.

/tmp/q is:


 0.3s create_lifecycle_rule: True

 1.0s create_lifecycle_rule: False

 1.4s create_lifecycle_rule: False

 1.1s create_lifecycle_rule: False

 1.2s create_lifecycle_rule: False

 1.1s create_lifecycle_rule: False

 1.0s create_lifecycle_rule: False

 1.2s create_lifecycle_rule: False

 1.1s create_lifecycle_rule: False

 1.1s create_lifecycle_rule: False

 0.7s create_lifecycle_rule: False

 0.4s create_lifecycle_rule: False

So it was True the first time, as expected, and False the remainder, as expected.

I tried wrapping up the put and try inside an if statement. That worked as expected. Now the MWE passes. (Not sure how to handle _retries)

    (changed, lifecycle_configuration) = compare_and_update_configuration(client, module,
                                                                          old_lifecycle_rules,
                                                                          new_rule)

    if changed:
        # Write lifecycle to bucket
        try:
            client.put_bucket_lifecycle_configuration(
                aws_retry=True,
                Bucket=name,
                LifecycleConfiguration=lifecycle_configuration)
        except is_boto3_error_message('At least one action needs to be specified in a rule'):
            # Amazon interpretted this as not changing anything
            changed = False
        except (botocore.exceptions.ClientError, botocore.exceptions.BotoCoreError) as e:  # pylint: disable=duplicate-except
            module.fail_json_aws(e, lifecycle_configuration=lifecycle_configuration, name=name, old_lifecycle_rules=old_lifecycle_rules)

        _changed = changed
        _retries = 10
        while wait and _changed and _retries:
            # We've seen examples where get_bucket_lifecycle_configuration returns
            # the updated rules, then the old rules, then the updated rules again,
            time.sleep(5)
            _retries -= 1
            new_rules = fetch_rules(client, module, name)
            (_changed, lifecycle_configuration) = compare_and_update_configuration(client, module,
                                                                                   new_rules,
                                                                                   new_rule)
    else:
        _retries=0

    new_rules = fetch_rules(client, module, name)

    module.exit_json(changed=changed, new_rule=new_rule, rules=new_rules,
                     old_rules=old_lifecycle_rules, _retries=_retries,
                     _config=lifecycle_configuration)

What's the best way to add a unit/integration test for this? My MWE uses multiple hosts. Is that easy to do with the existing test setup? Or is there a way to run with a loop concurrently on one host?

markuman commented 1 year ago

Ok I couldn't figure out how to run a playbook using a local clone of the module, without messing with my real global installation. (Are there docs for that somewhere?)

Yeah, basically you can also place hacky/patched modules in your roles/playbook directory in the library folder. The only thing you must change than is to call s3_lifecycle: instead of community.aws.s3_lifecycle:

See https://docs.ansible.com/ansible/2.8/user_guide/playbooks_best_practices.html#directory-layout

library/ # if any custom modules, put them here (optional)

What's the best way to add a unit/integration test for this? My MWE uses multiple hosts. Is that easy to do with the existing test setup? Or is there a way to run with a loop concurrently on one host?

Maybe @goneri or @tremble got an idea about testing.

mdavis-xyz commented 1 year ago

Note for testing: my MWE only was for a non-empty list of rules. The same change applies for removing rules. In my PR I wrote the same change twice. For an integration test we may want to duplicate the last 2 tasks in the MWE to change present to absent, to test that second change.