dynamodb ack controller keeps overriding the values set by applicationautoscaling-controller

aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes

https://aws-controllers-k8s.github.io/community/

Apache License 2.0

2.4k stars 253 forks source link

dynamodb ack controller keeps overriding the values set by applicationautoscaling-controller #2033

Open rj425 opened 6 months ago

rj425 commented 6 months ago

Describe the bug I have a table created using the dynamodb ack controller with some provisioned read and write capacity units. Also, i have asked the applicationautoscaling-controller to autoscale these two metrics based on the throughput. While the autoscaling controller works fine by enabling the autoscaling for read and write adjusting the values but the dynamodb-ack-controller keeps overriding it with the provisioned values set in Tables crd.

Steps to reproduce

Create a table using 'Table' crd with provisioned read and write capacity units.
Enable the auto-scaling on this table by creating 'ScalableTarget' and 'ScalingPolicy' for the application-autoscaling-controller.
Simulate the increasing load to trigger autoscaling defined in 'ScalingPolicy'.

Expected outcome application-autoscaling-controller should nicely adjust the metrics based on the load. Also, the dynamodb-ack-controller should not override the adjusted metric value (during scalup or scaledown events) to previously provisioned value (defined in Table crd) irrespective of whatever the load is.

a-hilaly commented 6 months ago

We have a very similar use case In the eks-controller.. users can set an annotation indicating that an external auto-scaler manages the NodeGroup size. We can implement something similar for DynamoDB controller as well.

rj425 commented 6 months ago

@a-hilaly How small is this fix? What is the possible ETA? Also, will this solution let Dynamodb controller update the other fields of the table CRD that are not controlled by the application-autoscaling-controller?

a-hilaly commented 6 months ago

@rj425 I can take this post KubeCon~ need to first understand what are all the fields that can be managed by an external autoscaller. Can GSI throughputs also be managed by the autoscaller?

Also, will this solution let Dynamodb controller update the other fields of the table CRD that are not controlled by the application-autoscaling-controller?

Correct, dynamodb-controller will continue managing the other fields as expected. The annotation will only instruct the controller to ignore a specific set of fields.

rj425 commented 6 months ago

Since a picture is worth a thousand words, I am posting a screenshot as an example of this behavior.

This is how the autoscaling looks like when the resource is managed by both the controllers (dynamodb & application-autoscaling). You can see how the read and write usage keep going back to 1100 and 1350.

51680a14-18b0-427e-90dd-9298e2d0d1f5

And because of this behavior, autoscaling controller fails to update it after a while because of this error:

LimitExceededException: Subscriber limit exceeded: Provisioned throughput decreases are limited within a given UTC day. After the first 4 decreases, each subsequent decrease in the same UTC day can be performed at most once every 3600 seconds. Number of decreases today: 12. Last decrease at Tuesday, March 12, 2024 at 11:44:46 AM Coordinated Universal Time. Next decrease can be made at Tuesday, March 12, 2024 at 12:44:46 PM Coordinated Universal Time

rj425 commented 6 months ago

@a-hilaly Thanks, will be eagerly waiting for the fix. And, Yes, GSI throughputs can also be managed using this application-autoscaling-controller.

rj425 commented 6 months ago

Hi @a-hilaly,

Is there any update on this issue? Is there any other information that is needed to reproduce this error?

a-hilaly commented 6 months ago

@rj425 This is in our bucket. We're currently working on shipping dynamic references and read-only resources~ i'll update here as soon as we have started working on this.

ack-bot commented 1 week ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale