Netflix / genie

Distributed Big Data Orchestration Service
https://netflix.github.io/genie
Apache License 2.0
1.71k stars 367 forks source link

Improve command cleanup process #1116

Closed tgianos closed 2 years ago

tgianos commented 2 years ago

Improve command cleanup process by removing problematic predicate and re-ordering operation

The job created threshold within the subquery for finding commands to deactivate could flip into a full table scan if the number of jobs matching the predicate passed a certain threshold.

This change removes that predicate to ensure the query uses nothing but indices. With the command creation threshold long enough and no job in database using it this should be relatively safe.

To be double safe the command deletion has been moved before the command deactivation so that the time between database cleanup invocations (24 hours for example) will be available for any problems to surface and a command to be reactivated if for some reason someone needs it still. Unlikely if someone hasn't used it in the last X days anyway.