OHDSI / PatientLevelPrediction

An R package for performing patient level prediction in an observational database in the OMOP Common Data Model.
https://ohdsi.github.io/PatientLevelPrediction
187 stars 88 forks source link

recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

Open Volpym opened 1 month ago

Volpym commented 1 month ago

Describe the bug Running recalibratePlpRefit doesn't return an object of class runPlp that is recalibrated on the new data object, as stated in the documentation but an object of type data.frame. Am I missing something?

Set up (please run in R "sessionInfo()" and copy the output here):

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8      
 [6] LC_MESSAGES=en_US.UTF-8       LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_2.1.5                  PatientLevelPrediction_5.0.5 FeatureExtraction_3.5.2      Andromeda_0.6.6              dplyr_1.1.4                 
[6] DatabaseConnector_6.3.2      memuse_4.2-3                

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     remotes_2.4.2        purrr_1.0.2          splines_4.2.1        rJava_1.0-11         lattice_0.20-45      vctrs_0.6.5         
 [8] generics_0.1.3       utf8_1.2.4           blob_1.2.4           survival_3.3-1       rlang_1.1.4          pkgbuild_1.4.4       pillar_1.9.0        
[15] glue_1.7.0           withr_3.0.0          DBI_1.2.3            ParallelLogger_3.3.0 Cyclops_3.4.1        bit64_4.0.5          dbplyr_2.5.0        
[22] plyr_1.8.9           lifecycle_1.0.4      memoise_2.0.1        tzdb_0.4.0           callr_3.7.6          fastmap_1.2.0        ps_1.7.7            
[29] parallel_4.2.1       curl_5.2.1           fansi_1.0.6          urltools_1.7.3       triebeard_0.4.1      Rcpp_1.0.12          backports_1.5.0     
[36] checkmate_2.3.1      cachem_1.1.0         desc_1.4.2           vroom_1.6.5          jsonlite_1.8.8       bit_4.0.5            hms_1.1.3           
[43] processx_3.8.4       SqlRender_1.18.0     rprojroot_2.0.4      grid_4.2.1           cli_3.6.3            tools_4.2.1          magrittr_2.0.3      
[50] tibble_3.2.1         RSQLite_2.3.7        crayon_1.5.3         pkgconfig_2.0.3      Matrix_1.5-0         pROC_1.18.5          rstudioapi_0.16.0   
[57] R6_2.5.1             compiler_4.2.1   

To Reproduce

cdmDatabaseSchema <- 'demo_cdm'
cdmDatabaseName <- 'postgres'
dbms <- "postgresql"
user <- 'postgres'
pw <- 'postgres'
server <- '<ip>/postgres'
port <- '5438'

connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = dbms,
                                                                server = server,
                                                                user = user,
                                                                password = pw,
                                                                port = port)
databaseDetails <- PatientLevelPrediction::createDatabaseDetails(  connectionDetails,
                                           cdmDatabaseSchema,
                                           cdmDatabaseName,
                                           tempEmulationSchema = cdmDatabaseSchema,
                                           cohortDatabaseSchema = cdmDatabaseSchema,
                                           cohortTable = "sinusitiscohort",
                                           outcomeDatabaseSchema = cdmDatabaseSchema,
                                           outcomeTable = "sinusitiscohort",
                                           outcomeIds = 5,
                                           cdmVersion = 5.3,
                                           cohortId = 3)

covariateSettings <- FeatureExtraction::createCovariateSettings(
  useDemographicsGender = T, 
  useDemographicsAgeGroup = T, 
  useConditionGroupEraLongTerm = T
), 

restrictPlpDataSettings <- PatientLevelPrediction::createRestrictPlpDataSettings(
  studyStartDate = "",
  studyEndDate = "",
  firstExposureOnly = F,
  washoutPeriod = 0,
  sampleSize = NULL
)

plpData <- PatientLevelPrediction::getPlpData(databaseDetails, covariateSettings, restrictPlpDataSettings)
outcomeId <- 5
pop1 <- PatientLevelPrediction::createStudyPopulationSettings(riskWindowStart = 1, 
                                      riskWindowEnd = 365,
                                      requireTimeAtRisk = T, 
                                      minTimeAtRisk = 364, 
                                      includeAllOutcomes = T)

newPopulation <- PatientLevelPrediction::createStudyPopulation(plpData, outcomeId,pop1, NULL)
model<-loadPlpModel('<path_to_rds>')
recalibratedModel <- recalibratePlpRefit(model, newPopulation, plpData)

Context I have divided the Eunomia dataset into three chunks. I trained a PLP model on the first chunk and now I am recalibrating it using the data from the second chunk. Next, I plan to recalibrate this model again using the third chunk of data. Finally, I will compare this final recalibrated model with the PLP model trained on the entire Eunomia dataset.

jreps commented 1 month ago

The documentation is definitely wrong. Thanks for reporting this. It appears to return the prediction for the new population with the original model and recalibrated model. It also adds attributes for how the model needed to be adjusted, but these seem hidden. It does refit the model, so this could be edited to return a runPlp, but I need to see whether it is used inside other functions before editing.

Volpym commented 1 month ago

Could you provide some information on how to generate the runPlp object myself?