ModelOriented / randomForestExplainer

A set of tools to understand what is happening inside a Random Forest
https://ModelOriented.github.io/randomForestExplainer/
230 stars 37 forks source link

Function explain_forest throws error #5

Closed oli666 closed 5 years ago

oli666 commented 6 years ago

Running explain_forest on a model trained on a popular eductional data set (German Credit Data) throws the following error:

_Quitting from lines 81-82 (Explain_foresttemplate.Rmd) Error in [.data.frame(rankings, , measures) : undefined columns selected

Code to reproduce:

library(tidyverse) library(randomForest)

> Warning: Paket 'randomForest' wurde unter R Version 3.4.4 erstellt

> randomForest 4.6-14

> Type rfNews() to see new features/changes/bug fixes.

>

> Attache Paket: 'randomForest'

> The following object is masked from 'package:dplyr':

>

> combine

> The following object is masked from 'package:ggplot2':

>

> margin

library(randomForestExplainer) set.seed(123) credit <- read_csv('http://invidio.drl.pl/files/german_credit.csv')

> Parsed with column specification:

> cols(

> .default = col_character(),

> default = col_integer(),

> duration_in_month = col_integer(),

> credit_amount = col_integer(),

> installment_as_income_perc = col_integer(),

> present_res_since = col_integer(),

> age = col_integer(),

> credits_this_bank = col_integer(),

> people_under_maintenance = col_integer()

> )

> See spec(...) for full column specifications.

credit <- credit %>% mutate_if(is.character, as.factor) %>% mutate(default = as.factor(default))

> Warning: Paket 'bindrcpp' wurde unter R Version 3.4.4 erstellt

credit_shuffled <- sample_frac(credit, 1) n <- nrow(credit_shuffled) n_train <- round(0.8 * n) train_indices <- sample(1:n, n_train) credit_train <- credit_shuffled[train_indices,] credit_test <- credit_shuffled[-train_indices,]

glimpse(credit_train)

> Observations: 800

> Variables: 21

> $ default 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,...

> $ account_check_status 0 <= ... < 200 DM, no checking acco...

> $ duration_in_month 12, 21, 24, 24, 12, 18, 36, 48, 18,...

> $ credit_history critical account/ other credits exi...

> $ purpose car (new), business, radio/televisi...

> $ credit_amount 2366, 1572, 3777, 2197, 1412, 866, ...

> $ savings 500 <= ... < 1000 DM, .. >= 1000 DM...

> $ present_emp_since 4 <= ... < 7 years, .. >= 7 years, ...

> $ installment_as_income_perc 3, 4, 4, 4, 4, 4, 1, 4, 4, 4, 4, 1,...

> $ personal_status_sex male : divorced/separated, female :...

> $ other_debtors none, none, none, none, guarantor, ...

> $ present_res_since 3, 4, 4, 4, 2, 2, 3, 2, 1, 4, 1, 3,...

> $ property if not A121/A122 : car or other, no...

> $ age 36, 36, 50, 43, 29, 25, 31, 38, 43,...

> $ other_installment_plans none, bank, none, none, none, none,...

> $ housing own, own, own, own, own, own, own, ...

> $ credits_this_bank 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1,...

> $ job management/ self-employed/ highly q...

> $ people_under_maintenance 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,...

> $ telephone yes, registered under the customers...

> $ foreign_worker yes, yes, yes, yes, yes, yes, yes, ...

credit_model <- randomForest( default ~ ., data = credit_train )

class(credit_model)

> [1] "randomForest.formula" "randomForest"

explain_forest(credit_model)

> processing file: Explain_forest_template.Rmd

> [1] accuracy_decrease and gini_decrease

> Quitting from lines 81-82 (Explain_forest_template.Rmd)

> Error in [.data.frame(rankings, , measures): nicht definierte Spalten gewählt

Created on 2018-07-12 by the reprex package (v0.2.0).

alexsanjoseph commented 5 years ago

Same issue here

pcofoche commented 5 years ago

Please, note: localImp has to be set as TRUE when building your forest, otherwise the explain_forest will quit with an error. I had the same error you mentioned, but setting localImp = TRUE when building the forest resolved it.