ansible-collections / community.aws

Ansible Collection for Community AWS
GNU General Public License v3.0
189 stars 398 forks source link

dynamodb_table fails if used with aws_application_scaling_policy because of # of scaledown limits #1558

Closed KvistA-ELS closed 2 years ago

KvistA-ELS commented 2 years ago

Summary

When using dynamodb_table to create or update a DynamoDB table that is using auto scaling, it will often fail because AWS only allows 4 scale downs a day, after that, you can only do scale down once an hour.

'Subscriber limit exceeded: Provisioned throughput decreases are limited within a given UTC day. After the first 4 decreases, each subsequent decrease in the same UTC day can be performed at most once every 3600 seconds. Number of decreases today: 8. Last decrease at Monday, October 10, 2022 at 1:00:19 PM Coordinated Universal Time. Next decrease can be made at Monday, October 10, 2022 at 2:00:19 PM Coordinated Universal Time

The reason for this fail is that when describe-table is called, it returns the values of DynamoDB, which then will be set by auto scaling.

The most easy fix for this, would be to add an option to ignore *_capacity when auto scaling is handled afterwards.

Issue Type

Bug Report

Component Name

dynamodb_table

Ansible Version

$ ansible --version
/usr/local/lib/python3.6/site-packages/ansible/parsing/vault/__init__.py:44: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
  from cryptography.exceptions import InvalidSignature
ansible [core 2.11.12] 
  config file = /data/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
  ansible collection location = /data/collections
  executable location = /usr/local/bin/ansible
  python version = 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
  jinja version = 3.0.3
  libyaml = True

Collection Versions

$ ansible-galaxy collection list
/usr/local/lib/python3.6/site-packages/ansible/parsing/vault/__init__.py:44: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
  from cryptography.exceptions import InvalidSignature

# /usr/local/lib/python3.6/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    1.5.1  
ansible.netcommon             2.5.0  
ansible.posix                 1.3.0  
ansible.utils                 2.4.3  
ansible.windows               1.8.0  
arista.eos                    2.2.0  
awx.awx                       19.4.0 
azure.azcollection            1.10.0 
check_point.mgmt              2.2.0  
chocolatey.chocolatey         1.1.0  
cisco.aci                     2.1.0  
cisco.asa                     2.1.0  
cisco.intersight              1.0.18 
cisco.ios                     2.6.0  
cisco.iosxr                   2.6.0  
cisco.meraki                  2.5.0  
cisco.mso                     1.2.0  
cisco.nso                     1.0.3  
cisco.nxos                    2.8.2  
cisco.ucs                     1.6.0  
cloudscale_ch.cloud           2.2.0  
community.aws                 1.5.0  
community.azure               1.1.0  
community.crypto              1.9.8  
community.digitalocean        1.13.0 
community.docker              1.10.2 
community.fortios             1.0.0  
community.general             3.8.3  
community.google              1.0.0  
community.grafana             1.3.0  
community.hashi_vault         1.5.0  
community.hrobot              1.2.1  
community.kubernetes          1.2.1  
community.kubevirt            1.0.0  
community.libvirt             1.0.2  
community.mongodb             1.3.2  
community.mysql               2.3.2  
community.network             3.0.0  
community.okd                 1.1.2  
community.postgresql          1.6.0  
community.proxysql            1.3.0  
community.rabbitmq            1.1.0  
community.routeros            1.2.0  
community.skydive             1.0.0  
community.sops                1.2.0  
community.vmware              1.17.0 
community.windows             1.8.0  
community.zabbix              1.5.1  
containers.podman             1.9.0  
cyberark.conjur               1.1.0  
cyberark.pas                  1.0.13 
dellemc.enterprise_sonic      1.1.0  
dellemc.openmanage            3.6.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
f5networks.f5_modules         1.13.0 
fortinet.fortimanager         2.1.4  
fortinet.fortios              2.1.3  
frr.frr                       1.0.3  
gluster.gluster               1.0.2  
google.cloud                  1.0.2  
hetzner.hcloud                1.6.0  
hpe.nimble                    1.1.4  
ibm.qradar                    1.0.3  
infinidat.infinibox           1.3.0  
inspur.sm                     1.3.0  
junipernetworks.junos         2.8.0  
kubernetes.core               1.2.1  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.12.1
netapp.elementsw              21.7.0 
netapp.ontap                  21.14.1
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.2.13 
netbox.netbox                 3.4.0  
ngine_io.cloudstack           2.2.2  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.0  
openstack.cloud               1.5.3  
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   1.6.6  
purestorage.flasharray        1.11.0 
purestorage.flashblade        1.8.1  
sensu.sensu_go                1.12.0 
servicenow.servicenow         1.0.6  
splunk.es                     1.0.2  
t_systems_mms.icinga_director 1.26.0 
theforeman.foreman            2.2.0  
vyos.vyos                     2.6.0  
wti.remote                    1.0.3  

# /data/collections/ansible_collections
Collection        Version
----------------- -------
amazon.aws        5.0.2  
ansible.posix     1.4.0  
community.aws     5.0.0  
community.docker  3.1.0  
community.general 5.7.0  

AWS SDK versions

$ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: mitch@garnaat.com
License: MIT
Location: /usr/local/lib/python3.6/site-packages
Requires: 
Required-by: 
---
Name: boto3
Version: 1.23.10
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: None
License: Apache License 2.0
Location: /usr/local/lib/python3.6/site-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.26.10
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: None
License: Apache License 2.0
Location: /usr/local/lib/python3.6/site-packages
Requires: urllib3, jmespath, python-dateutil
Required-by: s3transfer, boto3

Configuration

$ ansible-config dump --only-changed
/usr/local/lib/python3.6/site-packages/ansible/parsing/vault/__init__.py:44: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
  from cryptography.exceptions import InvalidSignature
COLLECTIONS_PATHS(/data/ansible.cfg) = ['/data/collections']
DEFAULT_HOST_LIST(/data/ansible.cfg) = ['/data/inventory']
DEPRECATION_WARNINGS(/data/ansible.cfg) = False
HOST_KEY_CHECKING(/data/ansible.cfg) = False
INTERPRETER_PYTHON(/data/ansible.cfg) = /usr/bin/python2.7

OS / Environment

Linux + Docker

Steps to Reproduce

By itself, this will of cause not fail because there haven't been done any scale up/downs, so to replicate this, this ansible will reduce read/write a 4 times and then the 5th time will fail. This will "mimic" the dynamic scaling that is done by auto scaling.

$ cat dynamodb-fail.yaml
- name: Testing DynamoDB scaling
  hosts: localhost
  connection: local
  gather_facts: False
  vars:
    sleep: 10

  tasks:
    - name: Remove dynamo table for test
      community.aws.dynamodb_table:
        name: "DynamoDB-fail-test"
        region: eu-west-1
        state: absent

    - name: Pause to allow delete to happen
      ansible.builtin.pause:
        seconds: "{{ sleep }}"

    - include_tasks: dynamodb-commands.yaml
      vars:
        capacity: "{{ item }}"
      with_sequence: start=10 stride=-1 end=6

    - name: "Create dynamo table that will fail because of too many decreases"
      community.aws.dynamodb_table:
        name: "DynamoDB-fail-test"
        region: eu-west-1
        hash_key_name: id
        hash_key_type: STRING
        range_key_name: timestamp
        range_key_type: STRING
        read_capacity: 1
        write_capacity: 1
        #ignore_capacity: yes

$ cat dynamodb-commands.yaml 
- name: "Create dynamo table with hash and range primary key ({{ capacity }})"
  community.aws.dynamodb_table:
    name: "DynamoDB-fail-test"
    region: eu-west-1
    hash_key_name: id
    hash_key_type: STRING
    range_key_name: timestamp
    range_key_type: STRING
    read_capacity: "{{ capacity }}"
    write_capacity: "{{ capacity }}"

- name: Pause to allow changes
  ansible.builtin.pause:
    seconds: "{{ sleep }}"

$ ansible-playbook dynamodb-fail.yaml
...

Expected Results

I expect being able to have dynamodb_table not try to set capacity when I don't want it to change anything, but it can't as it uses the dynamic capacity values to check if it needs to update.

Actual Results

Subscriber limit exceeded: Provisioned throughput decreases are limited within a given UTC day. After the first 4 decreases, each subsequent decrease in the same UTC day can be performed at most once every 3600 seconds. Number of decreases today: 4. Last decrease at Wednesday, October 12, 2022 at 11:12:17 AM Coordinated Universal Time. Next decrease can be made at Wednesday, October 12, 2022 at 12:12:17 PM Coordinated Universal Time

Code of Conduct

KvistA-ELS commented 2 years ago

For reference, this is how auto scaling is set up:

- name: Scaling_policy for reads in DynamoDB 
  community.aws.aws_application_scaling_policy:
    region: eu-west-1
    state: present
    policy_name: DynamoDBScalingPolicyRead
    service_namespace: dynamodb
    resource_id: 'table/DynamoDB-fail-test'
    scalable_dimension: dynamodb:table:ReadCapacityUnits
    policy_type: TargetTrackingScaling
    minimum_tasks: 1
    maximum_tasks: 10
    target_tracking_scaling_policy_configuration:
      TargetValue: 80
      PredefinedMetricSpecification:
        PredefinedMetricType: DynamoDBReadCapacityUtilization
ansibullbot commented 2 years ago

cc @jillr @loia @markuman @s-hertel @tremble click here for bot help

tremble commented 2 years ago

@KvistA-ELS

Thanks for taking the time to open this issue. If you don't pass the read_capacity and write_capacity parameters then Ansible shouldn't try setting them. Could you try that please?

Given that your autoscaling policy is managing the capacity is there a reason you need to pass read_capacity and write_capacity ?

KvistA-ELS commented 2 years ago

Hi @tremble They are set to 1 by default if not set, and that would very often be seen as a scale down :) https://github.com/ansible-collections/community.aws/blob/fe0811f9c070fdf6f69254a6e50510dfb7d6cf1a/plugins/modules/dynamodb_table.py#L570 /Anders

tremble commented 2 years ago

@KvistA-ELS That should only apply when creating a new table/index... If it's not, then the issue lies somewhere in that code path...

KvistA-ELS commented 2 years ago

@tremble I have just done a test where the last table update doesn't have the values - looks like you are right. I started out with an earlier version of the module that would always set 1 for capacity.

tremble commented 2 years ago

@KvistA-ELS the code had a pretty substantial rewrite with community.aws 2.1.0 (migrating from boto SDK to boto3)

tremble commented 2 years ago

I'm going to close out this issue (and the PR), since not passing the capacity parameters fixes the issue.