Closed egillax closed 2 months ago
Attention: Patch coverage is 0%
with 69 lines
in your changes are missing coverage. Please review.
Project coverage is 88.77%. Comparing base (
d9be57a
) to head (dc9dd31
). Report is 5 commits behind head on develop.:exclamation: Current head dc9dd31 differs from pull request most recent head d5ca7ee. Consider uploading reports for the commit d5ca7ee to get more accurate results
Files | Patch % | Lines |
---|---|---|
R/ExternalValidatePlp.R | 0.00% | 69 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This looks like a good solution. We can adjust it later for additional predictionProblem
parameters. We should make explicit in the documentation that the format of predictionProblem
is targetId
, outcomeId
, populationSettings
, which I think is currently only evident from the code.
I believe a next step would be to automatically determine the predictionProblem dataframe from the Strategus output files (not necessary for our study). Maybe ModelTransferModule could do that and save it in an additional file.
I'm guessing you are using predictionProblem as input rather than T, O and populationSettings as individual inputs as that way you can use a list of predictionProblems if you wanted to test on different tasks? I wonder whether we want to have the model in the list as well: T/O/Pop settings/model so you can validate lots of models across different tasks (this would be the most flexible)?
Also, we could have something smart that finds all the models for the same T and does the union of the covariateSettings across these models to extract a single plpData object per T?
For the output structure - I like that the data are saved and there is a validation log. Need to check the database upload functions as I think they look for a certain structure (i.e., the validation being inside the runPlp output location). We will want to make sure the save structure is compatible with current db upload code or that the db code can be easily modified for the new structure. I was thinking of modifying runPlp by adding a setting.csv that has columns like: 'analysisId', 'developmentDatabase', 'validationDatabase','modelDesignJson', 'plpDataLocation', 'modelLocation', 'checksum' that can be used as a look-up to determine details about the result folder 'analysis_X' (plus be used for incremental mode to skip models where the settings are unchanged like cohort generator) and then have all the results in the same result folder rather than using a result structure that gives info, if you like this idea, we probably want to do the same for external validation?
Changes in the most recent revision:
instead of using the name "problem" we use validationDesign
created using createValidationDesign
The function signature is:
createValidationDesign(
targetId,
outcomeId,
populationSettings,
restrictPlpDataSettings,
plpModelList,
recalibrate = NULL,
runCovariateSummary = TRUE
)
plpModelList can include either objects loaded with loadPlpModel
or paths to such objects. At the moment this only supports a single Target - Outcome pair per validationDesign, but it should ideally also support multiple Outcomes for one Target.
Then you validate with:
validateExternal(
validationDesignList,
databaseDetails,
logSettings,
outputFolder
)
Ideally I think this function should replace validateMultiplePlp
and/or externalValidateDbPlp
. I'm open for good names for this new one, we could as well deprecate the old ones in case someone is relying on them for a study.
I've tested this with seven models validated on two databases, or actually validated on my dev database twice (it pretended to be a different one the second time). I've also tested the shinyApp and all seems to work as it should. The output structure now is (shown for a single model, two cohorts in two databases):
outputFolder └── validationDatabase │ ├── Analysis_1 │ ├── Analysis_2 │ ├── targetId_11931_L1 │ ├── targetId_11932_L1 │ ├── validationLog.txt ├── validationDatabaseTwo │ ├── Analysis_1 │ ├── Analysis_2 │ ├── targetId_11931_L1 │ ├── targetId_11932_L1 │ ├── validationLog.txt ├── sqlite │ ├── databaseFile.sqlite
Where the analysis number if sequential and not corresponding to the same one during model development. Current limitation is all models need to have the same covariateSettings
. This is enforced and the function will error before extracting the data if this is not true. I'm thinking about how to fix that limitation in the most efficient way.
@jreps this is ready for second review.
For discussion @jreps @lhjohn
I made a new function externalValidate to simplify external validation for our network study and in general in Strategus.
This function allows us with one call to externally validate multiple models on multiple prediction problems. It allows either to test all models on all prediction problems or match models to the target cohort they were developed on.
The output file structure will look something like (for two prediction problems with models from two databases validated on the third:
outputFolder └── validationDatabaseName ├── developmentDatabaseName1 │ ├── Analysis_1 │ ├── Analysis_2 ├── developmentDatabaseName2 │ ├── Analysis_1 │ ├── Analysis_2 ├── plpData_Target_11931_Outcome_6243 ├── plpData_Target_11932_Outcome_298 └── validationLog.txt
Currently it's limited in the way that all models have to have used the same covariateSettings since the data is extracted once per prediction problem