Closed Nelly-Barret closed 4 weeks ago
Recall that, during the Transform step, when we create ExaminationRecord and DiseaseRecord instances we need to have references to Examination, Hospital and Disease instances. This is curently done by inserting them in the database and then to retrieve them in memory to keep track of the mappings, e.g., column name <-> examination id
, disease name <-> disease id
, etc.
Therefore, I can:
Load
class within the Transform
step to insert only xamination, Hospital and Disease instancesLoad
after Transform to load the rest (Patient, Sample, ExaminationRecord and DiseaseRecord instances) in the database
For now, the Transform and the Load steps are really mixed because:
This does not allow a good separation between the data transformation and the loading in the database.
Instead, I was thinking of the folowing:
MongoDB supports data loading from a JSON file: https://www.mongodb.com/resources/languages/json-to-mongodb