Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

Feature/notes support #13

Closed scottfleming closed 3 months ago

scottfleming commented 4 months ago

There was a bug processing the notes table from OMOP as follows:

Line 112 in meds_etl/src/meds_etl/omop.py defined a source_concept_id which in the case of the notes table would be note_class_source_concept_id. Key problem: note_class_source_concept_id is not a column in the notes table. This is invoked in the definition of the concept_id column specification in line 118 and then again in the specification of code from line 126 to try and map the concept_id column (which depends in part on the note_class_source_concept_id which does not exist) via the concept_id_map. When the lazy polars dataframe is finally collected in line 221, it triggers a polars.exceptions.ColumnNotFoundError.

This bug fixes the error by first checking if source_concept_id exists, otherwise it falls back to the concept_id column.

Along the way, it also updates the progress monitoring to use tqdm which can be helpful for gauging how long the ETL has been running and (more importantly) estimating how long it will take to finish.

Miking98 commented 3 months ago

This looks like a great pull request to me -- Would love to see merged!

Thanks for fixing the note issue / refactoring @scottfleming