MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.41k stars 1.5k forks source link

Column names are not set correctly (MIMIC III) #1738

Open 1990PACO opened 2 months ago

1990PACO commented 2 months ago

Prerequisites

Description

I had implement MIMIC III as Postgres SQL localy. When fetching the column names in Python with pyodbc they partially dont match to the .csv files in compairsion.

def query(table, sql): """ table: input tablename to query \n sql: input SQL query (select * from mimiciii.table) \n return --> Pandas Dataframe """ cnstring = f'DRIVER={{PostgreSQL ODBC Driver(UNICODE)}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}' cnxn = pyodbc.connect(cnstring) cursor = cnxn.cursor() colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall() rows = cursor.execute(sql).fetchall() df = pd.DataFrame.from_records(data=rows, columns=[colname[0] for colname in colnames]) df.columns = [str(i).upper() for i in df.columns] cursor.close() return df

Example: Wrong -- > ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS', 'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION', 'DISCHARGE_LOCATION', 'INSURANCE']

Correct --> ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME', 'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'DISCHARGE_LOCATION', 'INSURANCE', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS', 'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION', 'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA']

alistairewj commented 2 months ago

The query:

colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall()

Returns column names in a non-deterministic order, so you wouldn't expect the order to match exactly. The two groups look the same, just in a different order. I would verify that the sets of columns are equal, in which case this is expected and you just need to re-order your columns as necessary.