Currently, we routinely ensure log groups exist in our log handler. We use a throttling mechanism to avoid excessively calling the CreateLogGroup API. This mechanism doesn't work well with task logs and DAG processing logs, the reason, I suspect, being that running tasks or processing DAGs involve the creation of a separate process, which breaks our throttling mechanism.
Additionally, the reliance on CreateLogGroup even if the log group exists results in a lot of false failures in the customer's CloudTrail event history.
To solve both these problems, I am proposing implementing the mechanism for ensuring log groups exist as a process hook (currently called process conditions, but planning to rename) and attaching it to the main processes of schedulers, workers, and web servers. This way we avoid the challenges that happen in multiprocessing.
Acceptance Criteria
Implement a process hook for ensuring the existence of log groups.
Use DescribeLogGroups API first to test whether the log group exists or not, instead of blindly calling CreateLogGroup.
Overview
Currently, we routinely ensure log groups exist in our log handler. We use a throttling mechanism to avoid excessively calling the CreateLogGroup API. This mechanism doesn't work well with task logs and DAG processing logs, the reason, I suspect, being that running tasks or processing DAGs involve the creation of a separate process, which breaks our throttling mechanism.
Additionally, the reliance on CreateLogGroup even if the log group exists results in a lot of false failures in the customer's CloudTrail event history.
To solve both these problems, I am proposing implementing the mechanism for ensuring log groups exist as a process hook (currently called process conditions, but planning to rename) and attaching it to the main processes of schedulers, workers, and web servers. This way we avoid the challenges that happen in multiprocessing.
Acceptance Criteria
Additional Info
N/A