gip-inclusion / data-inclusion

data·inclusion aggrège les données de l'insertion sociale et professionnelle
https://api.data.inclusion.beta.gouv.fr/api/v0/docs
MIT License
6 stars 1 forks source link

chore(api) : Fix load_inclusion_data import #304

Closed vperron closed 1 month ago

vperron commented 1 month ago

There is a race condition where the import was started at exactly 30 minutes after every hour, which unfortunately is about exactly the moment when the now hourly main DAG exports its parquet files :x

    [2024-09-25, 08:33:42 UTC] {process_utils.py:194} INFO - uploading data to bucket='data/marts/2024-09-25/scheduled__2024-09-25T07:00:00+00:00/services.parquet'
    [2024-09-25, 08:33:49 UTC] {python.py:240} INFO - Done. Returned value was: None

Reset it at the first minute of the hour to maximize our chances.

Also, stop doing a difference for staging, as the data there isn't smaller.