Currently, we log using Dask streams. Underneath, events are created by workers and then retrieved by the main program once control returns to the main program.
The problem occurs when we have a Dask cluster setup and run pipelines on the cluster. Log events from previous pipeline runs are also added to the current logs of the pipeline, leading to log contamination. This makes it difficult to isolate logs for individual pipeline runs and leads to confusion in interpreting the logs.
Expected Behavior
Each pipeline run should have its own isolated logs, without contamination from previous runs.
Actual Behavior
Logs from previous pipeline runs are included in the logs of the current pipeline, leading to log contamination.
Environment (please complete the following information):
OS: Windows
Mode: Prefect Cloud
Version: 0.1.4
Steps to Reproduce
Steps to reproduce the behavior:
Set up a Dask cluster.
Run a pipeline that generates logs using Dask streams.
Observe the logs.
Run a second pipeline on the same cluster.
Observe that logs from the first pipeline are also present in the second pipeline's logs.
Possible Solution / Suggestion
Investigate the mechanism by which Dask streams handle log events and ensure that log events are properly segregated for each pipeline run. Consider implementing a mechanism to clear or reset log events at the start of each pipeline run.
Description
Currently, we log using Dask streams. Underneath, events are created by workers and then retrieved by the main program once control returns to the main program.
The problem occurs when we have a Dask cluster setup and run pipelines on the cluster. Log events from previous pipeline runs are also added to the current logs of the pipeline, leading to log contamination. This makes it difficult to isolate logs for individual pipeline runs and leads to confusion in interpreting the logs.
Expected Behavior
Each pipeline run should have its own isolated logs, without contamination from previous runs.
Actual Behavior
Logs from previous pipeline runs are included in the logs of the current pipeline, leading to log contamination.
Environment (please complete the following information):
Steps to Reproduce
Steps to reproduce the behavior:
Possible Solution / Suggestion
Investigate the mechanism by which Dask streams handle log events and ensure that log events are properly segregated for each pipeline run. Consider implementing a mechanism to clear or reset log events at the start of each pipeline run.