Nelly-Barret / BETTER-fairificator

The fairification tools for BETTER project.
https://www.better-health-project.eu/
0 stars 0 forks source link

Split metadata file into several if it contains metadata about several hospitals #9

Closed Nelly-Barret closed 4 weeks ago

Nelly-Barret commented 4 weeks ago

For instance, UC3 metadata (https://docs.google.com/spreadsheets/d/15H_ly3ZSFX18gcGktzONthcKlY-O1EG8/edit#gid=1361386009) contains the metadata for 2 hospitals: TERRASSA, UKK and BUZZI.

The ETL should take this into account beforehand, i.e., before calling the ETL (with the pre-processed metadata).

Nelly-Barret commented 4 weeks ago

In the main/preprocessing step:

metadata_filepath = "working-dir/metadata/metadata-"+HOSPITAL_NAME+".csv"

In the Extract step:

if hospital_name not in [BUZZI_UC1, IMGGE, HSJD]:
   # take the column associated to the current hospital 
   # and remove columns for other hopsitals
else:
   pass
Nelly-Barret commented 4 weeks ago

I also did it per dataset, meaning that I keep only the columns for the current hospital and the rows for the current dataset. Will be merged in main