ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.36k stars 165 forks source link

plot.model_performance_explainer outliers' labels depend on the order of model input #49

Open 12tafran opened 5 years ago

12tafran commented 5 years ago

Hi,

Following the example on https://pbiecek.github.io/DALEX/reference/plot.model_performance_explainer.html , if you rearrange the order of arguments from plot(mp_rf, mp_glm, mp_lm, geom = "boxplot", show_outliers = 1) to plot(mp_glm, mp_lm, mp_rf, geom = "boxplot", show_outliers = 1), you will get a graph where the outliers don't match the model.

It seems like we have to input the models best to worst in terms of root mean square of residuals for it for the outliers' label to match the model.

pbiecek commented 4 years ago

close due to lack of human resources

AngelFelizR commented 3 months ago

We still have the same problem in R

library("DALEX")
#> Welcome to DALEX (version: 2.4.3).
#> Find examples and detailed introduction at: http://ema.drwhy.ai/
library("randomForest")
#> randomForest 4.7-1.1
#> Type rfNews() to see new features/changes/bug fixes.

model_apart_lm <- archivist::aread("pbiecek/models/55f19")
explain_apart_lm <- DALEX::explain(model = model_apart_lm, 
                                   data    = apartments_test[,-1], 
                                   y       = apartments_test$m2.price, 
                                   label   = "Linear Regression")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Linear Regression 
#>   -> data              :  9000  rows  5  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.lm  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package stats , ver. 4.2.3 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1792.597 , mean =  3506.836 , max =  6241.447  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -257.2555 , mean =  4.687686 , max =  472.356  
#>   A new explainer has been created!

model_apart_rf <- archivist::aread("pbiecek/models/fe7a5")
explain_apart_rf <- DALEX::explain(model = model_apart_rf, 
                                   data    = apartments_test[,-1], 
                                   y       = apartments_test$m2.price, 
                                   label   = "Random Forest")
#> Preparation of a new explainer is initiated
#>   -> model label       :  Random Forest 
#>   -> data              :  9000  rows  5  cols 
#>   -> target variable   :  9000  values 
#>   -> predict function  :  yhat.randomForest  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package randomForest , ver. 4.7.1.1 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  1985.837 , mean =  3506.107 , max =  5788.052  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -762.3422 , mean =  5.416971 , max =  1318.093  
#>   A new explainer has been created!

mr_lm <- DALEX::model_performance(explain_apart_lm)
mr_rf <- DALEX::model_performance(explain_apart_rf)

# Works good
plot(mr_rf, mr_lm, 
     geom = "boxplot",
     show_outliers = 1)

# Doesn't assing the outliners correctly
plot(mr_lm, mr_rf, 
     geom = "boxplot",
     show_outliers = 1)

Created on 2024-06-08 with reprex v2.0.2