Closed ManuelSpinola closed 8 months ago
The metrics of ranger are calculated from out-of-bag predictions, those of DALEX on the data provided.
Note that it basically never makes sense to study random forest performance (non-oob) on the training data except to demonstrate its massive overfit.
Thank you very much Michael.
Manuel
El mar, 5 mar 2024 a las 0:21, Michael Mayer @.***>) escribió:
The metrics of ranger are calculated from out-of-bag predictions, those of DALEX on the data provided.
Note that it basically never makes sense to study random forest performance (non-oob) on the training data except to demonstrate its massive overfit.
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/561#issuecomment-1978042760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FB7DQ3XB2RAAPZEGX2LYWVP4XAVCNFSM6AAAAABEGH3HVKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYGA2DENZWGA . You are receiving this because you authored the thread.Message ID: @.***>
-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Sitio web institucional: ICOMVIS http://www.icomvis.una.ac.cr/index.php/manuel Sitio web personal: Sitio personal https://mspinola-sitioweb.netlify.app Blog sobre Ciencia de Datos: Blog de Ciencia de Datos https://mspinola-ciencia-de-datos.netlify.app
I see there are differences for the metrics performance between a ranger model and its explain. Why is that?
aps_ranger <- ranger::ranger(m2.price~., data = apartments, num.trees = 50)
aps_ranger Ranger result
Call: ranger::ranger(m2.price ~ ., data = apartments, num.trees = 50)
Type: Regression Number of trees: 50 Sample size: 1000 Number of independent variables: 5 Mtry: 2 Target node size: 5 Variable importance mode: none Splitrule: variance OOB prediction error (MSE): 93997.56 R squared (OOB): 0.8856602
exp_ranger <- explain(aps_ranger, data = apartments, y = apartments$m2.price)
mod_per <- model_performance(exp_ranger)
mod_per Measures for: regression mse : 21442.99 rmse : 146.4342 r2 : 0.9738904 mad : 84.57676
Residuals: 0% 10% 20% 30% -491.535000 -160.125376 -109.455093 -75.266200 40% 50% 60% 70% -43.538133 -20.939611 8.850819 47.124000 80% 90% 100% 97.417832 197.996033 759.546667