Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
23 stars 5 forks source link

Error on OMOP to MEDS conversion #37

Open ealonso-vicomtech opened 2 weeks ago

ealonso-vicomtech commented 2 weeks ago

Hi!

I am trying to execute the meds_etl_omop command to convert my OMOP dataset into MEDS but I'm getting the following error

Generating metadata from OMOP `concept` table
1it [00:06,  6.49s/it]
Decompressing OMOP tables, mapping to MEDS Flat format, writing to disk...
 80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                              | 4/5 [00:34<00:08,  8.63s/it]
Traceback (most recent call last):
  File "/data/venvs/lucia/bin/meds_etl_omop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 747, in main
    process_table_csv(task)
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 379, in process_table_csv
    write_event_data(
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 315, in write_event_data
    event_data.sink_parquet(fname, compression="zstd", compression_level=1, maintain_order=False)
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/utils/unstable.py", line 59, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2193, in sink_parquet
    return lf.sink_parquet(
           ^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: failed to determine supertype of list[extension] and datetime[μs]

Error originated just after this operation:
FILTER String(SNOMED/184099003).strict_cast(String).is_not_null() FROM
RENAME
  DF ["person_id", "gender_concept_id", "year_of_birth", "month_of_birth"]; PROJECT */18 COLUMNS; SELECTION: "None"

How can i deal with it? Looks like the error is related with the Person table but i cannot see which columns have the unsuported type.

Im using the following versions:

meds==0.1.3
meds_etl==0.1.3

When I upgrade meds to the last version it gives me this error

Generating metadata from OMOP `concept` table
1it [00:07,  7.23s/it]
Decompressing OMOP tables, mapping to MEDS Unsorted format, writing to disk...
 40%|████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                          | 2/5 [00:12<00:19,  6.34s/it]condition
incomplete mapping specified for `replace_strict`

Hint: Pass a `default` value to set unmapped values.
STREAMING:
   SELECT [col("person_id").strict_cast(Int64).alias("subject_id"), col("condition_start_datetime").str.strptime([String(raise)]).coalesce([col("condition_start_datetime").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])]).coalesce([col("condition_start_date").str.strptime([String(raise)]).coalesce([col("condition_start_date").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])])]).alias("time"), when([(col("__POLARS_CSER_0xddde4da9d1b6e86d")) != (0)]).then(col("__POLARS_CSER_0xddde4da9d1b6e86d")).otherwise(when([(col("__POLARS_CSER_0x37fad8cbfee072e")) != (0)]).then(col("__POLARS_CSER_0x37fad8cbfee072e")).otherwise(null.strict_cast(Int64))).replace_strict([Series, Series]).alias("code"), null.strict_cast(String).str.strip_chars([null]).cast(Float32).alias("numeric_value"), when(null.strict_cast(String).str.strip_chars([null]).cast(Float32).is_null()).then(null.strict_cast(String).str.strip_chars([null])).otherwise(null.strict_cast(String)).alias("text_value"), String(condition).alias("table"), col("visit_occurrence_id").alias("visit_id"), col("condition_end_datetime").str.strptime([String(raise)]).coalesce([col("condition_end_datetime").str.strptime([String(raise)]).dt.offset_by([String(1d)]).dt.offset_by([String(-1s)])]).alias("end")] FROM
     WITH_COLUMNS:
     [col("condition_source_concept_id").strict_cast(Int64).alias("__POLARS_CSER_0xddde4da9d1b6e86d"), col("condition_concept_id").strict_cast(Int64).alias("__POLARS_CSER_0x37fad8cbfee072e")] 
      FILTER when([(col("condition_source_concept_id").strict_cast(Int64)) != (0)]).then(col("condition_source_concept_id").strict_cast(Int64)).otherwise(when([(col("condition_concept_id").strict_cast(Int64)) != (0)]).then(col("condition_concept_id").strict_cast(Int64)).otherwise(null.strict_cast(Int64))).replace_strict([Series, Series]).is_not_null() FROM
        RENAME
          DF ["condition_occurrence_id", "person_id", "condition_concept_id", "condition_start_date"]; PROJECT 7/16 COLUMNS; SELECTION: None
 40%|████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                          | 2/5 [00:17<00:26,  8.68s/it]
Traceback (most recent call last):
  File "/data/venvs/lucia/bin/meds_etl_omop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 757, in main
    process_table_csv(task)
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 381, in process_table_csv
    write_event_data(
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 322, in write_event_data
    raise e
  File "/data/venvs/lucia/lib/python3.11/site-packages/meds_etl/omop.py", line 317, in write_event_data
    event_data.sink_parquet(fname, compression="zstd", compression_level=1, maintain_order=False)
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/_utils/unstable.py", line 58, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/venvs/lucia/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2385, in sink_parquet
    return lf.sink_parquet(
           ^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: incomplete mapping specified for `replace_strict`

Hint: Pass a `default` value to set unmapped values.
EthanSteinberg commented 2 weeks ago

@ealonso-vicomtech That error seems to indicate that you have concept ids within your condition_occurrence table that do not have any entries within your concept table?

Aka you have OMOP validity issues?

Can you verify that every concept id within your condition occurrence table has a corresponding concept entry?

ealonso-vicomtech commented 1 week ago

Hi,

Every concept_id in condition_occurrence table has a corresponding concept entry in concept table.

I am using OMOP CDM v5.4 and all the condition_source_ids are standard (SNOMED vocab).