edanalytics / edu_edfi_airflow

Manages extract-load of Ed-Fi data in Airflow
Other
4 stars 0 forks source link

EarthbeamDAG.upload_to_s3 does not check for naming conflicts #70

Open jayckaiser opened 3 weeks ago

jayckaiser commented 3 weeks ago

In the (rare) case where multiple input files are uploaded to S3 with the same name, they will overwrite one another. This can occur when processing Parquet files, where each file ends up with a name like part.N.parquet.

We need to update the code to check whether files are identically named and to append additional metadata to their names to prevent collisions in this instance.

jayckaiser commented 3 weeks ago

I've created branch hotfix/include_env_var_in_s3_filepaths as a first pass at this change.