There is a race condition where the import was started at exactly 30 minutes after every hour, which unfortunately is about exactly the moment when the now hourly main DAG exports its parquet files :x
[2024-09-25, 08:33:42 UTC] {process_utils.py:194} INFO - uploading data to bucket='data/marts/2024-09-25/scheduled__2024-09-25T07:00:00+00:00/services.parquet'
[2024-09-25, 08:33:49 UTC] {python.py:240} INFO - Done. Returned value was: None
Reset it at the first minute of the hour to maximize our chances.
Also, stop doing a difference for staging, as the data there isn't smaller.
There is a race condition where the import was started at exactly 30 minutes after every hour, which unfortunately is about exactly the moment when the now hourly
main
DAG exports its parquet files :xReset it at the first minute of the hour to maximize our chances.
Also, stop doing a difference for staging, as the data there isn't smaller.