There was a bug processing the notes table from OMOP as follows:
Line 112 in meds_etl/src/meds_etl/omop.py defined a source_concept_id which in the case of the notes table would be note_class_source_concept_id. Key problem: note_class_source_concept_id is not a column in the notes table.
This is invoked in the definition of the concept_id column specification in line 118 and then again in the specification of code from line 126 to try and map the concept_id column (which depends in part on the note_class_source_concept_id which does not exist) via the concept_id_map.
When the lazy polars dataframe is finally collected in line 221, it triggers a polars.exceptions.ColumnNotFoundError.
This bug fixes the error by first checking if source_concept_id exists, otherwise it falls back to the concept_id column.
Along the way, it also updates the progress monitoring to use tqdm which can be helpful for gauging how long the ETL has been running and (more importantly) estimating how long it will take to finish.
There was a bug processing the
notes
table from OMOP as follows:Line 112 in meds_etl/src/meds_etl/omop.py defined a
source_concept_id
which in the case of thenotes
table would benote_class_source_concept_id
. Key problem:note_class_source_concept_id
is not a column in thenotes
table. This is invoked in the definition of theconcept_id
column specification in line 118 and then again in the specification ofcode
from line 126 to try and map theconcept_id
column (which depends in part on thenote_class_source_concept_id
which does not exist) via theconcept_id_map
. When the lazy polars dataframe is finally collected in line 221, it triggers a polars.exceptions.ColumnNotFoundError.This bug fixes the error by first checking if
source_concept_id
exists, otherwise it falls back to theconcept_id
column.Along the way, it also updates the progress monitoring to use
tqdm
which can be helpful for gauging how long the ETL has been running and (more importantly) estimating how long it will take to finish.