OHDSI / PatientLevelPrediction

An R package for performing patient level prediction in an observational database in the OMOP Common Data Model.
https://ohdsi.github.io/PatientLevelPrediction
188 stars 88 forks source link

Error in Random Forest Training #303

Closed nbehzad closed 2 years ago

nbehzad commented 2 years ago

Describe the bug I trained Random Forest with the PLP develop branch. According to the log file, plpRun was completed with an error: Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'

When I loaded the plpResult, the prediction and performanceEvaluation objects were NULL, and the model object was an empty list. As a result, none of the results can be plotted with the plpPlot function. The output of the function was as follows:

Issue with plotSparseRoc
Issue with plotPredictedPDF
Issue with plotPreferencePDF
Issue with plotPrecisionRecall
Issue with plotF1Measure
Issue with plotDemographicSummary
Issue with plotSparseCalibration
Issue with plotSparseCalibration2
Issue with plotPredictionDistribution

I also tested it on other models, including Logistic Regression, Gradient Boosting, Naive Bayes, and AdaBoost; the plpResult object was complete and error-free, with all results plotted.

Set up (Session Info): R version 4.1.3 (2022-03-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.2.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] FeatureExtraction_3.2.0 Andromeda_0.6.1 dplyr_1.0.9
[4] PatientLevelPrediction_5.4.1 yaml_2.3.5 SqlRender_1.9.2
[7] DatabaseConnector_5.0.4 BQJdbcConnectionStringR_0.1 log4r_0.4.2

loaded via a namespace (and not attached): [1] reticulate_1.25 tidyselect_1.1.2 memuse_4.2-1 purrr_0.3.4
[5] rJava_1.0-6 lattice_0.20-45 colorspace_2.0-3 vctrs_0.4.1
[9] generics_0.1.3 utf8_1.2.2 blob_1.2.3 rlang_1.0.4
[13] pillar_1.8.0 glue_1.6.2 DBI_1.1.3 ParallelLogger_3.0.1 [17] xgboost_1.6.0.1 bit64_4.0.5 dbplyr_2.2.1 lifecycle_1.0.1
[21] plyr_1.8.7 munsell_0.5.0 gtable_0.3.0 zip_2.2.0
[25] memoise_2.0.1 labeling_0.4.2 fastmap_1.1.0 fansi_1.0.3
[29] Rcpp_1.0.9 scales_1.2.0 cachem_1.0.6 jsonlite_1.8.0
[33] farver_2.1.1 bit_4.0.4 gridExtra_2.3 digest_0.6.29
[37] ggplot2_3.3.6 hms_1.1.1 png_0.1-7 grid_4.1.3
[41] rprojroot_2.0.3 here_1.0.1 cli_3.3.0 tools_4.1.3
[45] magrittr_2.0.3 tibble_3.1.8 RSQLite_2.2.15 crayon_1.5.1
[49] tidyr_1.2.0 pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.4-0
[53] data.table_1.14.2 pROC_1.18.0 assertthat_0.2.1 rstudioapi_0.13
[57] R6_2.5.1

To Reproduce

setRandomForest(
        ntrees = list(500),
        criterion = list("gini"),
        maxDepth = list(17),
        minSamplesSplit = list(2),
        minSamplesLeaf = list(1),
        minWeightFractionLeaf = list(0),
        mtries = list("auto"),
        maxLeafNodes = list(NULL),
        minImpurityDecrease = list(0),
        bootstrap = list(TRUE),
        maxSamples = list(NULL),
        oobScore = list(FALSE),
        nJobs = list(NULL),
        classWeight = list(NULL),
        seed = 13
      )

PLP Log File 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs binary : TRUEincludeAllOutcomes : TRUEfirstExposureOnly : FALSEwashoutPeriod : 0removeSubjectsWithPriorOutcome : FALSEpriorOutcomeLookback : 9999requireTimeAtRisk : TRUEminTimeAtRisk : 90riskWindowStart : 90startAnchor : cohort startriskWindowEnd : 180endAnchor : cohort startrestrictTarToCohortEnd : FALSE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs test : 0.2train : 0.8seed : 13nfold : 5 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs numberOutcomestoNonOutcomes : 1sampleSeed : 1 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs : 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs minFraction : 0.001normalize : TRUEremoveRedundancy : TRUE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs fitFunction : fitSklearnparam : 500-gini-17-2-1-0-auto-0-TRUE-FALSE-13 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction checkInputs runSplitData : TRUErunSampleData : TRUErunfeatureEngineering : TRUErunPreprocessData : TRUErunModelDevelopment : TRUErunCovariateSummary : TRUE 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader Patient-Level Prediction Package version 5.4.1 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader Study started at: 2022-08-13 12:08:09 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader AnalysisID: RF 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader AnalysisName: PORPOISE 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader TargetID: 3 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader OutcomeID: 4 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader Cohort size: 35253 2022-08-13 12:08:09 [Main thread] INFO PatientLevelPrediction printHeader Covariates: 49175 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction outcomeId: 4 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction binary: TRUE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction includeAllOutcomes: TRUE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction firstExposureOnly: FALSE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction washoutPeriod: 0 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction removeSubjectsWithPriorOutcome: FALSE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction priorOutcomeLookback: 9999 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction requireTimeAtRisk: TRUE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction minTimeAtRisk: 90 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction restrictTarToCohortEnd: FALSE 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction riskWindowStart: 90 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction startAnchor: cohort start 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction riskWindowEnd: 180 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction endAnchor: cohort start 2022-08-13 12:08:09 [Main thread] DEBUG PatientLevelPrediction restrictTarToCohortEnd: FALSE 2022-08-13 12:08:10 [Main thread] INFO PatientLevelPrediction Outcome is 0 or 1 2022-08-13 12:08:10 [Main thread] DEBUG PatientLevelPrediction checkInputsSplit test: 0.2 2022-08-13 12:08:10 [Main thread] DEBUG PatientLevelPrediction checkInputsSplit train: 0.8 2022-08-13 12:08:10 [Main thread] DEBUG PatientLevelPrediction checkInputsSplit nfold: 5 2022-08-13 12:08:10 [Main thread] INFO PatientLevelPrediction checkInputsSplit seed: 13 2022-08-13 12:08:10 [Main thread] INFO PatientLevelPrediction Creating a 20% test and 80% train (into 5 folds) random stratified split by class 2022-08-13 12:08:10 [Main thread] INFO PatientLevelPrediction Data split into 7050 test cases and 28203 train cases (5642, 5641, 5640, 5640, 5640) 2022-08-13 12:09:24 [Main thread] INFO PatientLevelPrediction dataSummary Train Set: 2022-08-13 12:09:24 [Main thread] INFO PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction dataSummary 47013 covariates in train data 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction dataSummary Test Set: 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction sampleData Starting data sampling 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction sampleData Applying sameData 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction No sampling - returning same data 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction sampleData Finished data sampling 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction dataSummary Train Set: 2022-08-13 12:09:25 [Main thread] INFO PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction dataSummary 47013 covariates in train data 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction dataSummary Test Set: 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction featureEngineer Starting Feature Engineering 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction featureEngineer Applying sameData 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction No sampling - returning same data 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction featureEngineer Done Feature Engineering 2022-08-13 12:09:26 [Main thread] INFO PatientLevelPrediction dataSummary Train Set: 2022-08-13 12:09:27 [Main thread] INFO PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes 2022-08-13 12:09:28 [Main thread] INFO PatientLevelPrediction dataSummary 47013 covariates in train data 2022-08-13 12:09:28 [Main thread] INFO PatientLevelPrediction dataSummary Test Set: 2022-08-13 12:09:28 [Main thread] INFO PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes 2022-08-13 12:09:28 [Main thread] DEBUG PatientLevelPrediction preprocessData minFraction: 0.001 2022-08-13 12:09:28 [Main thread] DEBUG PatientLevelPrediction preprocessData normalize: TRUE 2022-08-13 12:09:28 [Main thread] DEBUG PatientLevelPrediction preprocessData removeRedundancy: TRUE 2022-08-13 12:10:05 [Main thread] INFO FeatureExtraction Removing 1 redundant covariates 2022-08-13 12:10:05 [Main thread] INFO FeatureExtraction Removing 33103 infrequent covariates 2022-08-13 12:10:05 [Main thread] INFO FeatureExtraction Normalizing covariates 2022-08-13 12:11:04 [Main thread] INFO FeatureExtraction Tidying covariates took 1.6 mins 2022-08-13 12:11:04 [Main thread] INFO PatientLevelPrediction dataSummary Train Set: 2022-08-13 12:11:04 [Main thread] INFO PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes 2022-08-13 12:11:12 [Main thread] INFO PatientLevelPrediction dataSummary 13909 covariates in train data 2022-08-13 12:11:12 [Main thread] INFO PatientLevelPrediction dataSummary Test Set: 2022-08-13 12:11:12 [Main thread] INFO PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes 2022-08-13 12:11:12 [Main thread] INFO PatientLevelPrediction runPlp
2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings classifier seed: 13 2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings requiresDenseMatrix: FALSE 2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings name: Random forest 2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings pythonImport: sklearn 2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings pythonImportSecond: ensemble 2022-08-13 12:11:12 [Main thread] DEBUG PatientLevelPrediction checkPySettings pythonClassifier: RandomForestClassifier 2022-08-13 12:11:12 [Main thread] INFO PatientLevelPrediction toSparseM starting toSparseM 2022-08-13 12:11:13 [Main thread] DEBUG PatientLevelPrediction toSparseM Max covariateId in original covariates: 46287576303 2022-08-13 12:11:13 [Main thread] INFO PatientLevelPrediction MapIds starting to map the columns and rows 2022-08-13 12:12:24 [Main thread] INFO PatientLevelPrediction MapIds finished MapCovariates 2022-08-13 12:12:24 [Main thread] DEBUG PatientLevelPrediction toSparseM # covariates in mapped covariateRef: 13909 2022-08-13 12:12:24 [Main thread] DEBUG PatientLevelPrediction toSparseM Max newCovariateId in mapping: 13909 2022-08-13 12:12:24 [Main thread] DEBUG PatientLevelPrediction toSparseM Max rowId in new : 28203 2022-08-13 12:12:24 [Main thread] INFO PatientLevelPrediction toSparseM toSparseM non temporal used 2022-08-13 12:12:24 [Main thread] INFO PatientLevelPrediction checkRam plpData size estimated to use 4.53% of available RAM (0.4GBs) 2022-08-13 12:13:00 [Main thread] DEBUG PatientLevelPrediction toSparseM Sparse matrix with dimensionality: 28203,13909 2022-08-13 12:13:00 [Main thread] INFO PatientLevelPrediction toSparseM finishing toSparseM 2022-08-13 12:13:01 [Main thread] INFO PatientLevelPrediction Rnning CV for Random forest model 2022-08-13 12:13:05 [Main thread] INFO PatientLevelPrediction Max fold: 5 2022-08-13 12:13:05 [Main thread] INFO PatientLevelPrediction Fold 1 2022-08-13 12:13:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 22561x13909 2022-08-13 12:13:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 22561 with 3300 outcomes 2022-08-13 12:13:06 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:14:03 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 0.938164699077606 2022-08-13 12:14:03 [Main thread] INFO PatientLevelPrediction Calculating predictions on left out fold set... 2022-08-13 12:14:06 [Main thread] INFO PatientLevelPrediction Fold 2 2022-08-13 12:14:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 22562x13909 2022-08-13 12:14:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 22562 with 3301 outcomes 2022-08-13 12:14:06 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:15:01 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 0.920043448607127 2022-08-13 12:15:01 [Main thread] INFO PatientLevelPrediction Calculating predictions on left out fold set... 2022-08-13 12:15:05 [Main thread] INFO PatientLevelPrediction Fold 3 2022-08-13 12:15:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 22563x13909 2022-08-13 12:15:06 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes 2022-08-13 12:15:06 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:16:01 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 0.925801265239716 2022-08-13 12:16:01 [Main thread] INFO PatientLevelPrediction Calculating predictions on left out fold set... 2022-08-13 12:16:05 [Main thread] INFO PatientLevelPrediction Fold 4 2022-08-13 12:16:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 22563x13909 2022-08-13 12:16:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes 2022-08-13 12:16:05 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:17:01 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 0.933382781346639 2022-08-13 12:17:01 [Main thread] INFO PatientLevelPrediction Calculating predictions on left out fold set... 2022-08-13 12:17:04 [Main thread] INFO PatientLevelPrediction Fold 5 2022-08-13 12:17:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 22563x13909 2022-08-13 12:17:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes 2022-08-13 12:17:05 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:18:01 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 0.935614132881165 2022-08-13 12:18:01 [Main thread] INFO PatientLevelPrediction Calculating predictions on left out fold set... 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO pROC roc.default Setting levels: control = 0, case = 1 2022-08-13 12:18:05 [Main thread] INFO PatientLevelPrediction Training final model using optimal parameters 2022-08-13 12:18:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data X dim: 28203x13909 2022-08-13 12:18:05 [Main thread] INFO PatientLevelPrediction fitPythonModel data Y length: 28203 with 4126 outcomes 2022-08-13 12:18:05 [Main thread] INFO PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null 2022-08-13 12:21:06 [Main thread] INFO PatientLevelPrediction fitPythonModel Training model took (mins): 3.02057627042135 2022-08-13 12:21:06 [Main thread] INFO PatientLevelPrediction Calculating predictions on all train data... 2022-08-13 12:21:22 [Main thread] ERROR PatientLevelPrediction 3 Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'
2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction Calculating covariate summary @ 2022-08-13 12:21:22 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction This can take a while... 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets Creating binary labels 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets Joining with strata 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets calculating subset of strata 1 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets calculating subset of strata 2 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets calculating subset of strata 3 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction createCovariateSubsets calculating subset of strata 4 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction Restricting to subgroup 2022-08-13 12:21:22 [Main thread] INFO PatientLevelPrediction Calculating summary for subgroup TrainWithNoOutcome 2022-08-13 12:21:53 [Main thread] INFO PatientLevelPrediction Restricting to subgroup 2022-08-13 12:21:53 [Main thread] INFO PatientLevelPrediction Calculating summary for subgroup TestWithNoOutcome 2022-08-13 12:22:19 [Main thread] INFO PatientLevelPrediction Restricting to subgroup 2022-08-13 12:22:19 [Main thread] INFO PatientLevelPrediction Calculating summary for subgroup TrainWithOutcome 2022-08-13 12:22:44 [Main thread] INFO PatientLevelPrediction Restricting to subgroup 2022-08-13 12:22:44 [Main thread] INFO PatientLevelPrediction Calculating summary for subgroup TestWithOutcome 2022-08-13 12:23:07 [Main thread] INFO PatientLevelPrediction aggregateCovariateSummaries Aggregating with labels and strata 2022-08-13 12:23:23 [Main thread] INFO PatientLevelPrediction Finished covariate summary @ 2022-08-13 12:23:23 2022-08-13 12:23:23 [Main thread] INFO PatientLevelPrediction runPlp Run finished successfully. 2022-08-13 12:23:23 [Main thread] INFO PatientLevelPrediction runPlp Saving PlpResult 2022-08-13 12:23:23 [Main thread] INFO PatientLevelPrediction savePlpModel Creating directory to save model 2022-08-13 12:23:24 [Main thread] INFO PatientLevelPrediction runPlp plpResult saved to ..\plpSingleOutput/RF\plpResult

jreps commented 2 years ago

I think this is an issue with the json saving again. It looks like sklearn updated a bunch of models but the python package that saves/loads them to json either hasn't been updated. Looking at that package https://github.com/mlrequest/sklearn-json it hasn't updated since 2019. It looks like we either have to write our own code to parse the python models into json (may be only a small edit required to the sklearn-json package) or go back to saving as python pickles. I'm going to turn off the json saving for randomForest.

jreps commented 2 years ago

I turned off the json saving for now, so this should be fixed in the latest develop branch.

nbehzad commented 2 years ago

I just tested it with the most recent develop branch, but the problem still remains.

ChungsooKim commented 2 years ago

Hi, I also get the same problem with the current develop branch.

Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'
ChungsooKim commented 2 years ago

Sorry for my previous post, it works well now. Error was due to settings in my JSON file. Settings from the JSON file have more priority over default values in the PLP package. Is it maybe helpful @nbehzad ?

nbehzad commented 2 years ago

It works for me as well. Thanks!