Error in Random Forest Training

nbehzad commented 2 years ago

Describe the bug I trained Random Forest with the PLP develop branch. According to the log file, plpRun was completed with an error: Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'

When I loaded the plpResult, the prediction and performanceEvaluation objects were NULL, and the model object was an empty list. As a result, none of the results can be plotted with the plpPlot function. The output of the function was as follows:

Issue with plotSparseRoc
Issue with plotPredictedPDF
Issue with plotPreferencePDF
Issue with plotPrecisionRecall
Issue with plotF1Measure
Issue with plotDemographicSummary
Issue with plotSparseCalibration
Issue with plotSparseCalibration2
Issue with plotPredictionDistribution

I also tested it on other models, including Logistic Regression, Gradient Boosting, Naive Bayes, and AdaBoost; the plpResult object was complete and error-free, with all results plotted.

Set up (Session Info): R version 4.1.3 (2022-03-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.2.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] FeatureExtraction_3.2.0 Andromeda_0.6.1 dplyr_1.0.9
[4] PatientLevelPrediction_5.4.1 yaml_2.3.5 SqlRender_1.9.2
[7] DatabaseConnector_5.0.4 BQJdbcConnectionStringR_0.1 log4r_0.4.2

loaded via a namespace (and not attached): [1] reticulate_1.25 tidyselect_1.1.2 memuse_4.2-1 purrr_0.3.4
[5] rJava_1.0-6 lattice_0.20-45 colorspace_2.0-3 vctrs_0.4.1
[9] generics_0.1.3 utf8_1.2.2 blob_1.2.3 rlang_1.0.4
[13] pillar_1.8.0 glue_1.6.2 DBI_1.1.3 ParallelLogger_3.0.1 [17] xgboost_1.6.0.1 bit64_4.0.5 dbplyr_2.2.1 lifecycle_1.0.1
[21] plyr_1.8.7 munsell_0.5.0 gtable_0.3.0 zip_2.2.0
[25] memoise_2.0.1 labeling_0.4.2 fastmap_1.1.0 fansi_1.0.3
[29] Rcpp_1.0.9 scales_1.2.0 cachem_1.0.6 jsonlite_1.8.0
[33] farver_2.1.1 bit_4.0.4 gridExtra_2.3 digest_0.6.29
[37] ggplot2_3.3.6 hms_1.1.1 png_0.1-7 grid_4.1.3
[41] rprojroot_2.0.3 here_1.0.1 cli_3.3.0 tools_4.1.3
[45] magrittr_2.0.3 tibble_3.1.8 RSQLite_2.2.15 crayon_1.5.1
[49] tidyr_1.2.0 pkgconfig_2.0.3 ellipsis_0.3.2 Matrix_1.4-0
[53] data.table_1.14.2 pROC_1.18.0 assertthat_0.2.1 rstudioapi_0.13
[57] R6_2.5.1

To Reproduce

setRandomForest(
        ntrees = list(500),
        criterion = list("gini"),
        maxDepth = list(17),
        minSamplesSplit = list(2),
        minSamplesLeaf = list(1),
        minWeightFractionLeaf = list(0),
        mtries = list("auto"),
        maxLeafNodes = list(NULL),
        minImpurityDecrease = list(0),
        bootstrap = list(TRUE),
        maxSamples = list(NULL),
        oobScore = list(FALSE),
        nJobs = list(NULL),
        classWeight = list(NULL),
        seed = 13
      )

PLP Log File 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] INFO 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:09 [Main thread] 2022-08-13 12:08:10 [Main thread] INFO 2022-08-13 12:08:10 [Main thread] 2022-08-13 12:08:10 [Main thread] 2022-08-13 12:08:10 [Main thread] 2022-08-13 12:08:10 [Main thread] INFO 2022-08-13 12:08:10 [Main thread] INFO 2022-08-13 12:08:10 [Main thread] INFO 2022-08-13 12:09:24 [Main thread] INFO 2022-08-13 12:09:24 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:25 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:26 [Main thread] INFO 2022-08-13 12:09:27 [Main thread] INFO 2022-08-13 12:09:28 [Main thread] INFO 2022-08-13 12:09:28 [Main thread] INFO 2022-08-13 12:09:28 [Main thread] INFO 2022-08-13 12:09:28 [Main thread] 2022-08-13 12:09:28 [Main thread] 2022-08-13 12:09:28 [Main thread] 2022-08-13 12:10:05 [Main thread] INFO 2022-08-13 12:10:05 [Main thread] INFO 2022-08-13 12:10:05 [Main thread] INFO 2022-08-13 12:11:04 [Main thread] INFO 2022-08-13 12:11:04 [Main thread] INFO 2022-08-13 12:11:04 [Main thread] INFO 2022-08-13 12:11:12 [Main thread] INFO 2022-08-13 12:11:12 [Main thread] INFO 2022-08-13 12:11:12 [Main thread] INFO 2022-08-13 12:11:12 [Main thread] INFO 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] 2022-08-13 12:11:12 [Main thread] INFO 2022-08-13 12:11:13 [Main thread] 2022-08-13 12:11:13 [Main thread] INFO 2022-08-13 12:12:24 [Main thread] INFO 2022-08-13 12:12:24 [Main thread] 2022-08-13 12:12:24 [Main thread] 2022-08-13 12:12:24 [Main thread] 2022-08-13 12:12:24 [Main thread] INFO 2022-08-13 12:12:24 [Main thread] INFO 2022-08-13 12:13:00 [Main thread] 2022-08-13 12:13:00 [Main thread] INFO 2022-08-13 12:13:01 [Main thread] INFO 2022-08-13 12:13:05 [Main thread] INFO 2022-08-13 12:13:05 [Main thread] INFO 2022-08-13 12:13:06 [Main thread] INFO 2022-08-13 12:13:06 [Main thread] INFO 2022-08-13 12:13:06 [Main thread] INFO 2022-08-13 12:14:03 [Main thread] INFO 2022-08-13 12:14:03 [Main thread] INFO 2022-08-13 12:14:06 [Main thread] INFO 2022-08-13 12:14:06 [Main thread] INFO 2022-08-13 12:14:06 [Main thread] INFO 2022-08-13 12:14:06 [Main thread] INFO 2022-08-13 12:15:01 [Main thread] INFO 2022-08-13 12:15:01 [Main thread] INFO 2022-08-13 12:15:05 [Main thread] INFO 2022-08-13 12:15:06 [Main thread] INFO 2022-08-13 12:15:06 [Main thread] INFO 2022-08-13 12:15:06 [Main thread] INFO 2022-08-13 12:16:01 [Main thread] INFO 2022-08-13 12:16:01 [Main thread] INFO 2022-08-13 12:16:05 [Main thread] INFO 2022-08-13 12:16:05 [Main thread] INFO 2022-08-13 12:16:05 [Main thread] INFO 2022-08-13 12:16:05 [Main thread] INFO 2022-08-13 12:17:01 [Main thread] INFO 2022-08-13 12:17:01 [Main thread] INFO 2022-08-13 12:17:04 [Main thread] INFO 2022-08-13 12:17:05 [Main thread] INFO 2022-08-13 12:17:05 [Main thread] INFO 2022-08-13 12:17:05 [Main thread] INFO 2022-08-13 12:18:01 [Main thread] INFO 2022-08-13 12:18:01 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:18:05 [Main thread] INFO 2022-08-13 12:21:06 [Main thread] INFO 2022-08-13 12:21:06 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:22 [Main thread] INFO 2022-08-13 12:21:53 [Main thread] INFO 2022-08-13 12:21:53 [Main thread] INFO 2022-08-13 12:22:19 [Main thread] INFO 2022-08-13 12:22:19 [Main thread] INFO 2022-08-13 12:22:44 [Main thread] INFO 2022-08-13 12:22:44 [Main thread] INFO 2022-08-13 12:23:07 [Main thread] INFO 2022-08-13 12:23:23 [Main thread] INFO 2022-08-13 12:23:23 [Main thread] INFO 2022-08-13 12:23:23 [Main thread] INFO 2022-08-13 12:23:23 [Main thread] INFO 2022-08-13 12:23:24 [Main thread] INFO DEBUG PatientLevelPrediction checkInputs binary : TRUEincludeAllOutcomes : TRUEfirstExposureOnly : FALSEwashoutPeriod : 0removeSubjectsWithPriorOutcome : FALSEpriorOutcomeLookback : 9999requireTimeAtRisk : TRUEminTimeAtRisk : 90riskWindowStart : 90startAnchor : cohort startriskWindowEnd : 180endAnchor : cohort startrestrictTarToCohortEnd : FALSE DEBUG PatientLevelPrediction checkInputs test : 0.2train : 0.8seed : 13nfold : 5 DEBUG PatientLevelPrediction checkInputs numberOutcomestoNonOutcomes : 1sampleSeed : 1 DEBUG PatientLevelPrediction checkInputs : DEBUG PatientLevelPrediction checkInputs minFraction : 0.001normalize : TRUEremoveRedundancy : TRUE DEBUG PatientLevelPrediction checkInputs fitFunction : fitSklearnparam : 500-gini-17-2-1-0-auto-0-TRUE-FALSE-13 DEBUG PatientLevelPrediction checkInputs runSplitData : TRUErunSampleData : TRUErunfeatureEngineering : TRUErunPreprocessData : TRUErunModelDevelopment : TRUErunCovariateSummary : TRUE PatientLevelPrediction printHeader Patient-Level Prediction Package version 5.4.1 PatientLevelPrediction printHeader Study started at: 2022-08-13 12:08:09 PatientLevelPrediction printHeader AnalysisID: RF PatientLevelPrediction printHeader AnalysisName: PORPOISE PatientLevelPrediction printHeader TargetID: 3 PatientLevelPrediction printHeader OutcomeID: 4 PatientLevelPrediction printHeader Cohort size: 35253 PatientLevelPrediction printHeader Covariates: 49175 DEBUG PatientLevelPrediction outcomeId: 4 DEBUG PatientLevelPrediction binary: TRUE DEBUG PatientLevelPrediction includeAllOutcomes: TRUE DEBUG PatientLevelPrediction firstExposureOnly: FALSE DEBUG PatientLevelPrediction washoutPeriod: 0 DEBUG PatientLevelPrediction removeSubjectsWithPriorOutcome: FALSE DEBUG PatientLevelPrediction priorOutcomeLookback: 9999 DEBUG PatientLevelPrediction requireTimeAtRisk: TRUE DEBUG PatientLevelPrediction minTimeAtRisk: 90 DEBUG PatientLevelPrediction restrictTarToCohortEnd: FALSE DEBUG PatientLevelPrediction riskWindowStart: 90 DEBUG PatientLevelPrediction startAnchor: cohort start DEBUG PatientLevelPrediction riskWindowEnd: 180 DEBUG PatientLevelPrediction endAnchor: cohort start DEBUG PatientLevelPrediction restrictTarToCohortEnd: FALSE PatientLevelPrediction Outcome is 0 or 1 DEBUG PatientLevelPrediction checkInputsSplit test: 0.2 DEBUG PatientLevelPrediction checkInputsSplit train: 0.8 DEBUG PatientLevelPrediction checkInputsSplit nfold: 5 PatientLevelPrediction checkInputsSplit seed: 13 PatientLevelPrediction Creating a 20% test and 80% train (into 5 folds) random stratified split by class PatientLevelPrediction Data split into 7050 test cases and 28203 train cases (5642, 5641, 5640, 5640, 5640) PatientLevelPrediction dataSummary Train Set: PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes PatientLevelPrediction dataSummary 47013 covariates in train data PatientLevelPrediction dataSummary Test Set: PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes PatientLevelPrediction sampleData Starting data sampling PatientLevelPrediction sampleData Applying sameData PatientLevelPrediction No sampling - returning same data PatientLevelPrediction sampleData Finished data sampling PatientLevelPrediction dataSummary Train Set: PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes PatientLevelPrediction dataSummary 47013 covariates in train data PatientLevelPrediction dataSummary Test Set: PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes PatientLevelPrediction featureEngineer Starting Feature Engineering PatientLevelPrediction featureEngineer Applying sameData PatientLevelPrediction No sampling - returning same data PatientLevelPrediction featureEngineer Done Feature Engineering PatientLevelPrediction dataSummary Train Set: PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes PatientLevelPrediction dataSummary 47013 covariates in train data PatientLevelPrediction dataSummary Test Set: PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes DEBUG PatientLevelPrediction preprocessData minFraction: 0.001 DEBUG PatientLevelPrediction preprocessData normalize: TRUE DEBUG PatientLevelPrediction preprocessData removeRedundancy: TRUE FeatureExtraction Removing 1 redundant covariates FeatureExtraction Removing 33103 infrequent covariates FeatureExtraction Normalizing covariates FeatureExtraction Tidying covariates took 1.6 mins PatientLevelPrediction dataSummary Train Set: PatientLevelPrediction dataSummary Fold 1 5642 patients with 826 outcomes - Fold 2 5641 patients with 825 outcomes - Fold 3 5640 patients with 825 outcomes - Fold 4 5640 patients with 825 outcomes - Fold 5 5640 patients with 825 outcomes PatientLevelPrediction dataSummary 13909 covariates in train data PatientLevelPrediction dataSummary Test Set: PatientLevelPrediction dataSummary 7050 patients with 1031 outcomes PatientLevelPrediction runPlp
DEBUG PatientLevelPrediction checkPySettings classifier seed: 13 DEBUG PatientLevelPrediction checkPySettings requiresDenseMatrix: FALSE DEBUG PatientLevelPrediction checkPySettings name: Random forest DEBUG PatientLevelPrediction checkPySettings pythonImport: sklearn DEBUG PatientLevelPrediction checkPySettings pythonImportSecond: ensemble DEBUG PatientLevelPrediction checkPySettings pythonClassifier: RandomForestClassifier PatientLevelPrediction toSparseM starting toSparseM DEBUG PatientLevelPrediction toSparseM Max covariateId in original covariates: 46287576303 PatientLevelPrediction MapIds starting to map the columns and rows PatientLevelPrediction MapIds finished MapCovariates DEBUG PatientLevelPrediction toSparseM # covariates in mapped covariateRef: 13909 DEBUG PatientLevelPrediction toSparseM Max newCovariateId in mapping: 13909 DEBUG PatientLevelPrediction toSparseM Max rowId in new : 28203 PatientLevelPrediction toSparseM toSparseM non temporal used PatientLevelPrediction checkRam plpData size estimated to use 4.53% of available RAM (0.4GBs) DEBUG PatientLevelPrediction toSparseM Sparse matrix with dimensionality: 28203,13909 PatientLevelPrediction toSparseM finishing toSparseM PatientLevelPrediction Rnning CV for Random forest model PatientLevelPrediction Max fold: 5 PatientLevelPrediction Fold 1 PatientLevelPrediction fitPythonModel data X dim: 22561x13909 PatientLevelPrediction fitPythonModel data Y length: 22561 with 3300 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 0.938164699077606 PatientLevelPrediction Calculating predictions on left out fold set... PatientLevelPrediction Fold 2 PatientLevelPrediction fitPythonModel data X dim: 22562x13909 PatientLevelPrediction fitPythonModel data Y length: 22562 with 3301 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 0.920043448607127 PatientLevelPrediction Calculating predictions on left out fold set... PatientLevelPrediction Fold 3 PatientLevelPrediction fitPythonModel data X dim: 22563x13909 PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 0.925801265239716 PatientLevelPrediction Calculating predictions on left out fold set... PatientLevelPrediction Fold 4 PatientLevelPrediction fitPythonModel data X dim: 22563x13909 PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 0.933382781346639 PatientLevelPrediction Calculating predictions on left out fold set... PatientLevelPrediction Fold 5 PatientLevelPrediction fitPythonModel data X dim: 22563x13909 PatientLevelPrediction fitPythonModel data Y length: 22563 with 3301 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 0.935614132881165 PatientLevelPrediction Calculating predictions on left out fold set... pROC roc.default Setting levels: control = 0, case = 1 pROC roc.default Setting levels: control = 0, case = 1 pROC roc.default Setting levels: control = 0, case = 1 pROC roc.default Setting levels: control = 0, case = 1 pROC roc.default Setting levels: control = 0, case = 1 pROC roc.default Setting levels: control = 0, case = 1 PatientLevelPrediction Training final model using optimal parameters PatientLevelPrediction fitPythonModel data X dim: 28203x13909 PatientLevelPrediction fitPythonModel data Y length: 28203 with 4126 outcomes PatientLevelPrediction fitPythonModel ntrees:500 criterion:gini maxDepth:17 minSamplesSplit:2 minSamplesLeaf:1 minWeightFractionLeaf:0 mtries:auto maxLeafNodes:null minImpurityDecrease:0 bootstrap:TRUE oobScore:FALSE nJobs:null seed:13 classWeight:null maxSamples:null PatientLevelPrediction fitPythonModel Training model took (mins): 3.02057627042135 PatientLevelPrediction Calculating predictions on all train data... ERROR PatientLevelPrediction 3 Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'
PatientLevelPrediction Calculating covariate summary @ 2022-08-13 12:21:22 PatientLevelPrediction This can take a while... PatientLevelPrediction createCovariateSubsets Creating binary labels PatientLevelPrediction createCovariateSubsets Joining with strata PatientLevelPrediction createCovariateSubsets calculating subset of strata 1 PatientLevelPrediction createCovariateSubsets calculating subset of strata 2 PatientLevelPrediction createCovariateSubsets calculating subset of strata 3 PatientLevelPrediction createCovariateSubsets calculating subset of strata 4 PatientLevelPrediction Restricting to subgroup PatientLevelPrediction Calculating summary for subgroup TrainWithNoOutcome PatientLevelPrediction Restricting to subgroup PatientLevelPrediction Calculating summary for subgroup TestWithNoOutcome PatientLevelPrediction Restricting to subgroup PatientLevelPrediction Calculating summary for subgroup TrainWithOutcome PatientLevelPrediction Restricting to subgroup PatientLevelPrediction Calculating summary for subgroup TestWithOutcome PatientLevelPrediction aggregateCovariateSummaries Aggregating with labels and strata PatientLevelPrediction Finished covariate summary @ 2022-08-13 12:23:23 PatientLevelPrediction runPlp Run finished successfully. PatientLevelPrediction runPlp Saving PlpResult PatientLevelPrediction savePlpModel Creating directory to save model PatientLevelPrediction runPlp plpResult saved to ..\plpSingleOutput/RF\plpResult

jreps commented 2 years ago

I think this is an issue with the json saving again. It looks like sklearn updated a bunch of models but the python package that saves/loads them to json either hasn't been updated. Looking at that package https://github.com/mlrequest/sklearn-json it hasn't updated since 2019. It looks like we either have to write our own code to parse the python models into json (may be only a small edit required to the sklearn-json package) or go back to saving as python pickles. I'm going to turn off the json saving for randomForest.

jreps commented 2 years ago

I turned off the json saving for now, so this should be fixed in the latest develop branch.

nbehzad commented 2 years ago

I just tested it with the most recent develop branch, but the problem still remains.

ChungsooKim commented 2 years ago

Hi, I also get the same problem with the current develop branch.

Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'RandomForestClassifier' object has no attribute 'min_impurity_split'

ChungsooKim commented 2 years ago

Sorry for my previous post, it works well now. Error was due to settings in my JSON file. Settings from the JSON file have more priority over default values in the PLP package. Is it maybe helpful @nbehzad ?

nbehzad commented 2 years ago

It works for me as well. Thanks!

OHDSI / PatientLevelPrediction

Error in Random Forest Training #303