Open JackLewis-digirati opened 3 months ago
Also noted: SQL script is not terminating when Lambda terminates and can continue running long after lambda is stopped. As lambda is retried this can result in multiple instances running at the same time, locking tables and affecting performance. For possible explanation see: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda
Will moving to ECS fix this issue, or do we need to explicitly terminate the running sql script when we receive a SIGKILL/SIGTERM? Or do we need to alter the db connection timeout?
As this is such a long running query, do we need to consider running it outwith a transaction, or otherwise improve performance?
Within a larger environment, it's seen that the entity recalculator can take up to an hour to run. Given this is currently an AWS lambda function with a maximum timeout of 15 minutes, this can mean the entity recalculator times out before the updates are completed.
A way around this is to use an ECS task that has no limitation on timeout and so should complete correctly
NOTE: there will need to be some sort of alert if this ECS task runs for several hours - possibly cloudwatch?