ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.38k stars 166 forks source link

Differences between performance metrics between ranger model and its explain #561

Closed ManuelSpinola closed 8 months ago

ManuelSpinola commented 8 months ago

I see there are differences for the metrics performance between a ranger model and its explain. Why is that?

aps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees = 50)

aps_ranger Ranger result

Call: ranger::ranger(m2.price ~ ., data = apartments, num.trees = 50)

Type: Regression Number of trees: 50 Sample size: 1000 Number of independent variables: 5 Mtry: 2 Target node size: 5 Variable importance mode: none Splitrule: variance OOB prediction error (MSE): 93997.56 R squared (OOB): 0.8856602

exp_ranger <- explain(aps_ranger, data = apartments, y = apartments$m2.price)

mod_per <- model_performance(exp_ranger)

mod_per Measures for: regression mse : 21442.99 rmse : 146.4342 r2 : 0.9738904 mad : 84.57676

Residuals: 0% 10% 20% 30% -491.535000 -160.125376 -109.455093 -75.266200 40% 50% 60% 70% -43.538133 -20.939611 8.850819 47.124000 80% 90% 100% 97.417832 197.996033 759.546667

mayer79 commented 8 months ago

The metrics of ranger are calculated from out-of-bag predictions, those of DALEX on the data provided.

Note that it basically never makes sense to study random forest performance (non-oob) on the training data except to demonstrate its massive overfit.

ManuelSpinola commented 8 months ago

Thank you very much Michael.

Manuel

El mar, 5 mar 2024 a las 0:21, Michael Mayer @.***>) escribió:

The metrics of ranger are calculated from out-of-bag predictions, those of DALEX on the data provided.

Note that it basically never makes sense to study random forest performance (non-oob) on the training data except to demonstrate its massive overfit.

— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/561#issuecomment-1978042760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB7DQ3XB2RAAPZEGX2LYWVP4XAVCNFSM6AAAAABEGH3HVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYGA2DENZWGA . You are receiving this because you authored the thread.Message ID: @.***>

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Sitio web institucional: ICOMVIS http://www.icomvis.una.ac.cr/index.php/manuel Sitio web personal: Sitio personal https://mspinola-sitioweb.netlify.app Blog sobre Ciencia de Datos: Blog de Ciencia de Datos https://mspinola-ciencia-de-datos.netlify.app