SQLite import for mimic3 gives mixed column type warning

armando-fandango commented 2 years ago

Prerequisites

[X ] Put an X between the brackets on this line if you have done all of the following:
- Checked the online documentation: https://mimic.mit.edu/
- Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

While trying to import mimic3 into SQLite with import.py, I get the following error:

Starting processing DATETIMEEVENTS.csv.gz
mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.
  for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing INPUTEVENTS_CV.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (20,21) have mixed types. Specify dtype option on import or set low_memory=False.
  for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing NOTEEVENTS.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (4,5) have mixed types. Specify dtype option on import or set low_memory=False.
  for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...
Starting processing CHARTEVENTS.csv.gz
/home/armando/projects/mimic-code/mimic-iii/buildmimic/sqlite/import.py:25: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.
  for chunk in pd.read_csv(f, index_col="ROW_ID", chunksize=CHUNKSIZE):
...

pshuwei commented 1 year ago

Hi, I also am running the import.py code and I ran into the same problem...

Did you manage to figure it out or find an alternative solution?

alistairewj commented 1 year ago

It's not strictly an error but it may result in an inconsistent data load (I haven't checked). Essentially the load uses pandas as a convenience. pandas tries a low memory load, fails, and reverts to a high memory load. It can be fixed by specifying the known data types for each table in the read_csv call.

armando-fandango commented 1 year ago

Since the column types are already known in advance and are not going to change since its a frozen/snapshot dataset, hence would it be good to add the column type to the import script? I can send a pull request if this solution is acceptable.

alistairewj commented 1 year ago

Yes it would for sure, and yes we would love a PR!

MIT-LCP / mimic-code

SQLite import for mimic3 gives mixed column type warning #1237

Prerequisites

Description