alistairewj / mimic-iv-aline-study

Replication of the arterial line study in MIMIC-IV
61 stars 31 forks source link

Not able to replicate the study #1

Open AnoopRKulkarni opened 2 years ago

AnoopRKulkarni commented 2 years ago

Hello,

I have installed the MIMIC-IV 1.0 dataset on my local machine and have followed the instructions in mimic-code repository to create all the tables and load data.

After that I completed the creation of "aline" schema using the scripts mentioned in this repository. For most part, the analysis follows closely along the notebook in the repository but towards the end, I get a p-value of 0.019 ! Clearly am NOT able to replicate the study anywhere close.

Is anyone aware of what the issue can be? and what I need to fix this? Am I missing something?

Thanks and regards ~anoop

More details

Cohort size: 23390 - exclusion_readmission 17564 - exclusion_shortstay 24127 - exclusion_vasopressors 26733 - exclusion_septic 13137 - exclusion_aline_before_admission 56645 - exclusion_not_ventilated_first24hr 20545 - exclusion_service_surgical Will remove 74367 of 76540 patients.

Replicating the flow of the flowchart from Chest paper. 76540 - removing 35770 (46.73%) patients - short stay // readmission. 40770 - removing 27589 (67.67%) patients - not ventilated in first 24 hours. 13181

Accuracy of 66.91 from PyMatch with unbalanced matching.

and then this!

Result of propensity score followed by matching: p = 0.019. Odds ratio: 0.72 [0.54 - 0.94].

AnoopRKulkarni commented 2 years ago

A few excursions into the mimic-code directory and I decided to use aline_propensity_score.Rmd (from mimic-iii directory) for computing the p-value using the original R Matching package.

What I did is the following:

1) Created the aline schema and its tables with MIMIC-IV dataset and used SQL scripts in this repo 2) Saved the dataframe as aline_data.csv 2) Used the glm and R Matching package to compute the p-value using the McNemar's Chi-squared test with continuity correction and it came out to be 0.501 !! 3) BUT, if I use the R "vcd" package to compute p-value using their mantelhaen.test() function (the CMH test really) on the R matched data, then I still get p-value as 0.019.

So seems like there is an issue in the aline database and any suggestions from those who created and/or know the dataset sufficiently well, for any clues on where the issue could be.

thanks in advance

~anoop