OHDSI / PatientLevelPrediction

An R package for performing patient level prediction in an observational database in the OMOP Common Data Model.
https://ohdsi.github.io/PatientLevelPrediction
188 stars 89 forks source link

timeSplitter not working with ATLAS prediction packages? #170

Closed lhjohn closed 4 years ago

lhjohn commented 4 years ago

Describe the bug My prediction models do not fit when using prediction packages and selecting "time" split in ATLAS. I could reproduce this on two machines. Could this be my data? Are there some prerequisites for a time split that I forgot about?

Set up (please run in R "sessionInfo()" and copy the output here): R version 3.5.3 (2019-03-11) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

To Reproduce https://epi.jnj.com/atlas/#/prediction/175

PLP Log File 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction Patient-Level Prediction Package version 3.0.16 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction AnalysisID: Analysis_1 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction CohortID: 16316 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction OutcomeID: 7414 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction Cohort size: 2346252 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction Covariates: 10 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction Population size: 1579893 2020-05-28 03:45:06 [Main thread] INFO PatientLevelPrediction Cases: 34700 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction testSplit: time 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction outcomeCount: 34700 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction plpData class: plpData 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction testfraction: 0.2 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction nfold class: integer 2020-05-28 03:45:06 [Main thread] DEBUG PatientLevelPrediction nfold: 3 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction Patient-Level Prediction Package version 3.0.16 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction AnalysisID: Analysis_6 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction CohortID: 16317 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction OutcomeID: 7414 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction Cohort size: 2346252 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction Covariates: 10 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction Population size: 1579893 2020-05-28 03:45:08 [Main thread] INFO PatientLevelPrediction Cases: 34700 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction testSplit: time 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction outcomeCount: 34700 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction plpData class: plpData 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction testfraction: 0.2 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction nfold class: integer 2020-05-28 03:45:08 [Main thread] DEBUG PatientLevelPrediction nfold: 3 This continues like this for all analyses...

Additional context When using simulated data the time splitter seems to work as intended.

jreps commented 4 years ago

Hi Henrik,

Thanks for the info - I debugged the package and it was an issue with ATLAS creating then nfold as an integer but timeSplitter expecting a double. I've fix it in the latest Andromeda version here (we will be moving to this soon): https://github.com/OHDSI/PatientLevelPrediction/tree/sqlLite/R

I noticed a lot of people being dropped due to requiring full time at risk - we are submitting a paper soon that shows this can cause bias - so you might want to remove the time at risk restriction (will give you more data as well)

Best wishes, Jenna

lhjohn commented 4 years ago

Thanks for looking into this Jenna,

I also changed the cohort to now include patients with index date before 01-01-2014 instead of 01-01-2015, which should guarantee that almost every patient can have 5 years TAR and do not get dropped, because the database may not have data yet until 31-12-2019.

Is this a good fix for the problem or would you suggest to remove the TAR restrictions for people that do not experience the outcome?