Closed pipliggins closed 2 weeks ago
The resources are independent so can be parallelized, probably most of this is due to unpacking and repacking from fhir.resources
which is yet to undergo a rewrite to support Pydantic v2 (which is much faster). Will take a look with a profiler.
Alternative could be to parse and validate in two steps - parsing and conversion should be relatively fast, as that would just need the mapping file, without any reference to fhirflat.resources
. Validating would still need construction of resource object, but that could be parallelised across rows/resources.
Using pandarallel for ingest_to_flat()
conversions to/from object format provides a significant speedup, even without parallelising across different resources -
Patient took 1.18 seconds to convert 67 rows.
Encounter took 0.97 seconds to convert 67 rows.
2 resources not created due to validation errors. Errors saved to encounter_errors.csv
Observation took 2.19 seconds to convert 2194 rows.
Condition took 1.33 seconds to convert 461 rows.
67 subjects for the Dengue subset, converting ~30 observations/conditions: