Closed Nelly-Barret closed 4 weeks ago
In the main/preprocessing step:
metadata_filepath = "working-dir/metadata/metadata-"+HOSPITAL_NAME+".csv"
In the Extract step:
if hospital_name not in [BUZZI_UC1, IMGGE, HSJD]:
# take the column associated to the current hospital
# and remove columns for other hopsitals
else:
pass
I also did it per dataset, meaning that I keep only the columns for the current hospital and the rows for the current dataset. Will be merged in main
For instance, UC3 metadata (https://docs.google.com/spreadsheets/d/15H_ly3ZSFX18gcGktzONthcKlY-O1EG8/edit#gid=1361386009) contains the metadata for 2 hospitals: TERRASSA, UKK and BUZZI.
The ETL should take this into account beforehand, i.e., before calling the ETL (with the pre-processed metadata).