Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

Default number of shards with `meds_etl_mimic` raises error on MIMIC-IV (v2.2) demo #8

Closed tompollard closed 4 months ago

tompollard commented 5 months ago

Running meds_etl_mimic ./source/ ./output/meds with the MIMIC-IV demo raises a ValueError ("cannot concat empty list").

The default number of shards is currently 100 (--num_shards=100) . We could consider:

Processing tables into ./output/meds/temp
Processing icu/icustays
Processing icu/caregiver
Processing icu/d_items
Processing icu/datetimeevents
Processing icu/inputevents
Processing icu/chartevents
Processing icu/procedureevents
Processing icu/ingredientevents
Processing icu/outputevents
Processing hosp/admissions
Processing hosp/d_hcpcs
Processing hosp/d_icd_diagnoses
Processing hosp/d_icd_procedures
Processing hosp/d_labitems
Processing hosp/diagnoses_icd
Processing hosp/drgcodes
Processing hosp/emar
Processing hosp/emar_detail
Processing hosp/hcpcsevents
Processing hosp/labevents
Processing hosp/microbiologyevents
Processing hosp/omr
Processing hosp/patients
Processing hosp/pharmacy
Processing hosp/poe
Processing hosp/poe_detail
Processing hosp/prescriptions
Processing hosp/procedures_icd
Processing hosp/provider
Processing hosp/services
Processing hosp/transfers
Processing each shard
Processing shard  0
Processing shard  1
Processing shard  2
Processing shard  3
Processing shard  4
Traceback (most recent call last):
  File "/Users/tompollard/projects/meds_etl/env/bin/meds_etl_mimic", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/tompollard/projects/meds_etl/src/meds_etl/mimic/__init__.py", line 444, in main
    all_events = pl.concat(events, how="diagonal_relaxed")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tompollard/projects/meds_etl/env/lib/python3.11/site-packages/polars/functions/eager.py", line 133, in concat
    raise ValueError("cannot concat empty list")
ValueError: cannot concat empty list
EthanSteinberg commented 4 months ago

Sorry for missing this earlier. Yes, this is a bug and needs to be fixed. I think the solution is to rewrite the mimic ETL so it uses the flat ETL that has more of these bug fixes built-in. I'll do that.