Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

Specifying static values in flat format #31

Open rvandewater opened 4 weeks ago

rvandewater commented 4 weeks ago

According to the standard specification (https://github.com/Medical-Event-Data-Standard/meds), for the time value ""# Static events will have a null timestamp here". If we do initialize the column as None when converting flat to meds, using convert_flat_to_meds, I get the following error:


Traceback (most recent call last):
  File "/mnt/c/Users/Robin/Documents/Git/YAIB-cohorts/Python/utils.py", line 130, in <module>
    meds_etl_test(Path("/mnt/c/Users/Robin/Documents/Git/YAIB-cohorts/data/mortality24/mimic_demo/"), Path("/mnt/c/Users/Robin/Documents/Git/YAIB-cohorts/data/mortality24/mimic_demo/flat_data"))
  File "/mnt/c/Users/Robin/Documents/Git/YAIB-cohorts/Python/utils.py", line 78, in meds_etl_test
    convert_flat_to_meds(source_flat_path=save_dir, target_meds_path=save_dir, num_shards=1)
  File "/root/miniconda3/envs/YAIB-cohorts/lib/python3.10/site-packages/meds_etl/flat.py", line 412, in convert_flat_to_meds
    parquet_processor(task)
  File "/root/miniconda3/envs/YAIB-cohorts/lib/python3.10/site-packages/meds_etl/flat.py", line 267, in process_parquet_file
    create_and_write_shards_from_table(table, temp_dir, num_shards, time_formats, metadata_columns, fname)
  File "/root/miniconda3/envs/YAIB-cohorts/lib/python3.10/site-packages/meds_etl/flat.py", line 251, in create_and_write_shards_from_table
    verify_shard(shard, filename)
  File "/root/miniconda3/envs/YAIB-cohorts/lib/python3.10/site-packages/meds_etl/flat.py", line 155, in verify_shard
    raise ValueError(error_message)
ValueError: Have rows with invalid time in sta.parquet
ERROR conda.cli.main_run:execute(47): `conda run python /mnt/c/Users/Robin/Documents/Git/YAIB-cohorts/Python/utils.py` failed. (See above for error)

Should we change this check to allow for static measurements to be integrated when converting meds_flat to meds?

EthanSteinberg commented 4 weeks ago

Good catch! I'll assign this to myself and fix next week, but if anyone wants to fix this sooner feel free to assign and fix yourself.