Nelly-Barret / BETTER-fairificator

The fairification tools for BETTER project.
https://www.better-health-project.eu/
0 stars 0 forks source link

Separate transform from load in the ETL #11

Closed Nelly-Barret closed 4 weeks ago

Nelly-Barret commented 4 weeks ago

For now, the Transform and the Load steps are really mixed because:

This does not allow a good separation between the data transformation and the loading in the database.

Instead, I was thinking of the folowing:

MongoDB supports data loading from a JSON file: https://www.mongodb.com/resources/languages/json-to-mongodb

Nelly-Barret commented 4 weeks ago

Recall that, during the Transform step, when we create ExaminationRecord and DiseaseRecord instances we need to have references to Examination, Hospital and Disease instances. This is curently done by inserting them in the database and then to retrieve them in memory to keep track of the mappings, e.g., column name <-> examination id, disease name <-> disease id, etc.

Therefore, I can:

Nelly-Barret commented 4 weeks ago

Done at https://github.com/Nelly-Barret/BETTER-fairificator/pull/14