Closed mirmuss closed 1 month ago
Thanks for the report!
I can reproduce this bug, and it looks like something I accidentally broke that ETL. Sorry about that! (We test on mimic-iv-demo, but this bug only triggers on the full mimic-iv unfortunately).
I have just fixed this and did a full mimic ETL to verify that it works.
If you git pull you should be able to rerun.
I'll leave this issue open in case you run into further bugs.
Thanks for the fast response!
I face another error when using the duckdb backend:
Processing each shard
Converting from MEDS Flat to MEDS using DuckDB with sharding...
Traceback (most recent call last):
File "/nfs/home/mirmus/envs/femr/bin/meds_etl_mimic", line 8, in <module>
sys.exit(main())
File "/nfs/home/mirmus/projects/femr/meds_etl/src/meds_etl/mimic/__init__.py", line 423, in main
meds_etl.flat.convert_flat_to_meds(
File "/nfs/home/mirmus/projects/femr/meds_etl/src/meds_etl/flat.py", line 803, in convert_flat_to_meds
convert_flat_to_meds_duckdb(
File "/nfs/home/mirmus/projects/femr/meds_etl/src/meds_etl/flat.py", line 695, in convert_flat_to_meds_duckdb
conn.sql(f"CREATE VIEW v{i} as SELECT {full_query} FROM '{task}'")
duckdb.duckdb.ParserException: Parser Error: syntax error at or near "table"
However the cpp backend raised another error:
Processing each shard
/nfs/home/mirmus/projects/femr/meds_etl/src/meds_etl/mimic/__init__.py:472: UserWarning: Polars found a filename. Ensure you pass a path to the file instead of a python file object when possible for best performance.
table = pl.read_csv(f, infer_schema_length=0)
which it seems to be fixed by adding .read()
to the meds_etl/src/meds_etl/mimic/__init__.py:472
:
table = pl.read_csv(f.read(), infer_schema_length=0)
this way I get 100 parquet files in the data directory.
I will be running femr repo using these outputs to make sure if everything is fine
I face another error when using the duckdb backend:
@scottfleming It looks like your duckdb backend is malfunctioning? I'm going to remove the duckdb backend from the README then.
However the cpp backend raised another error:
That's not an error, that's a warning that our code isn't 100% efficient. It would be good to make this more efficient though, which I guess I can do.
@mirmuss Did everything work for you? I think I'm going to close this issue in the meantime, but feel free to reopen if you run into further issues.
Hi, I wanted to try speed gains from the cpp backend when I realized the following issue:
I face this very same error regardless of the backend, when I install from main branch instead of pip Am I missing something?