OHDSI / PatientLevelPrediction

An R package for performing patient level prediction in an observational database in the OMOP Common Data Model.
https://ohdsi.github.io/PatientLevelPrediction
182 stars 88 forks source link

No covariateSummary given for features generated through feature engineering #284

Open solis9753 opened 2 years ago

solis9753 commented 2 years ago

Is your feature request related to a problem? Please describe. The covariateSummary() function takes as input the original plpData$covariateData object and as a result a covariate summary is given only on features that are provided from the start. This means that features generated within featureEngineering are not included in the covariate summary when runPlp() ends. See this part of the code from lines 449 to 458.

Describe the solution you'd like Include covariate summary for features created within featureEngineer()

Describe alternatives you've considered Either merge data$Train$covariatedData and data$Test$covariateData prior to calling covariateSummary() or a separate set of code for feature engineered settings with a flag for returning covariate summary for those. I see there is a flag existing already for feature engineered within covariateSummary() but not used. I am sure there must be better ways.

Additional context Requesting this, I know also that covariateSummary() can take a long time to complete if there are a lot of features, so I am wondering what will happen in my case when I create 100's thousands of covariates. But I would like to have a summary for those covariates also. Maybe a summary only for the features that are selected in the final model?

jreps commented 1 year ago

Yeah - that requires applying the feature engineering during the covariate summary. I skipped it due to time it takes to run, but thanks to Egill edits to use arrow that should be a lot faster now, so lets add in the ability to run the feature engineering into the start of covariateSummary(). I'll mark this bug and moderate complexity.