aws-solutions-library-samples / aws-ops-automator

A solution for automated and scheduled execution of actions on selected AWS resources, including an updated EBS Snapshot Scheduler
https://aws.amazon.com/solutions/implementations/ops-automator/
Other
102 stars 47 forks source link

TaskTableCleanup error #21

Closed peruzzof closed 4 years ago

peruzzof commented 5 years ago

We installed the newest version of the OpsAutomator last week and since then we are getting this daily alert. I couldnt find a way to change the ExecutionSize for this internal task so I increased the full Standard size to 192Mb. I already looked on DynamoDB and couldnt find any problem that could have slow down the process.

There is any idea on how to find the exact problem that we are getting here?

{ "code": "ERR_EXECUTION_TASK", "log-stream": null, "level": "Error", "caller": "handle_request", "module": "handlers.execution_handler", "log-group": "AWS-OpsAutomator-201-logs", "message": "2019-08-22 - 02:15:44.835 - ERROR : Error execution of execute-action for task TaskTableCleanup\n Timeout execution action", "line": 492 }

peruzzof commented 5 years ago

Just to add more information: Outr TaskTrackingTable has more than 40k itens (35k the first time), I believe that with the on-demand capacity we cannot finish the scan in 15 minutes window. So probably we will need to figure out something else to clean this table (change away from Lambda, most probably). But I dont have experience with DynamoDB neither with Lambda, so maybe I am missing something.

peruzzof commented 5 years ago

Just to add more information here. I understood that the tool relays on TTL configuration to purge data, so I disabled the cleanup task (that was failing every single day). The size of table is still increasing (55k itens right now), without a bigger increase in the number of instances/volumes.

So far I understand that the problem is controlled, but if the cleanup task is not needed it should be disabled by default, to avoid this kind of issues.

maykays commented 4 years ago

Will not fix. We're shifting our development priority to Operations Conductor. You are welcome to fork the solution to make updates.