Open alexdunnjpl opened 2 months ago
@alexdunnjpl when you say "implement configuration" is this an event scheduler configuration?
@jordanpadams I'm fuzzy on the details, but I think it requires defining a cluster for each task definition and setting a container limit on each cluster. Simply, "do some AWS Console stuff"
@sjoshi-jpl will have a better idea of the details I suspect
Thanks @alexdunnjpl. As a task, this is 100% going to get lost in the 100s of tickets we have open right now. I will try to keep track of this and add to our overall release plan.
The need for this should be somewhat mitigated (though not completely avoided) by https://github.com/NASA-PDS/registry-sweepers/pull/115 as now, only provenance should result in any redundant work being done.
EDIT Actually this is incorrect - there's still a concern of multiple instances tripping over each other in the event of an influx of data which causes >cadencePeriod
container runtime
💡 Description
Currently, if a sweeper executes for longer than its schedule cadence, multiple instances of the sweeper will run concurrently.
This causes additional cost due to both redundant processing and a slowdown of all jobs due to increased database load, and could affect service if the database is loaded heavily enough.
Implement configuration to allow execution of <=1 container instance per task definition (i.e. node) at any point in time.
@jordanpadams this isn't blocking anything, but the sooner it's done, the shorter we can make our sweepers cadence and the performance/cost impact is nontrivial.