USEPA / spmodel

spmodel: Spatial Statistical Modeling and Prediction in R
https://usepa.github.io/spmodel/
GNU General Public License v3.0
12 stars 0 forks source link

variable importance and partial dependent plot splmRF #16

Closed ManuelSpinola closed 6 months ago

ManuelSpinola commented 6 months ago

How to obtain variable importance and partial dependent plot using splmRF?

michaeldumelle commented 6 months ago

Thanks @ManuelSpinola for the inquiry and for your interset in the software! splmRF(formula, data, ...) works by fitting a random forest using the ranger R package via ranger(formula, data, ...) and then fitting a spatial model to the residuals from the random forest fit. The combination of a random forest fit using ranger and the spatial modeling of the residuals enables predictions that incorporate both random forest and spatial components. Thus, splmRF() is most useful when combined with predict() to make predictions at unobserved locations. However, you can still access the random forest object fit via ranger and inspect it.

The output from splmRF(formula, data, ...) is a list that contains several elements, one of which is "ranger" holding the random forest fit. This can be accessed and used just like a call to ranger(formula, data, ...) and passed to other functions and software packages that access ranger objects. The ... argument to splmRF(formula, data, ...) allows for additional arguments to be passed to ranger(formula, data, ...) or splm(residuals, data, ...). For example, to access variable importance (using the moss data in spmodel), you could run

mod <-  splmRF(
  log_Zn ~ log_dist2road + sideroad,
  data = moss,
  spcov_type = "exponential",
  importance = "impurity"
)

The importance = "impurity" argument is passed to ranger(), making the relevant ranger() call

ranger(
  log_Zn ~ log_dist2road + sideroad,
  data = moss,
  importance = "impurity"
)

(Note that within splmRF(), we implicitly remove the geometry from moss via sf::st_drop_geometry().)

You can store the ranger part of the model fit via

ranger_fit <- mod$ranger

Then you can access the variable importance via the ranger object:

ranger_fit$variable.importance

Then, for example, you could pass ranger_fit to partial() from the pdp R package for partial dependence plots.

ManuelSpinola commented 6 months ago

Thank you very much Michael.

El mar, 5 mar 2024 a las 12:16, Michael Dumelle @.***>) escribió:

Thanks @ManuelSpinola https://github.com/ManuelSpinola for the inquiry. splmRF(formula, data, ...) works by fitting a random forest using the ranger R package http://imbs-hl.github.io/ranger/ via ranger(formula, data, ...) and then fitting a spatial model to the residuals from the random forest fit. The combination of a random forest fit using ranger and the spatial modeling of the residuals enables predictions that incorporate both random forest and spatial components. Thus, splmRF() is most useful when combined with predict() to make predictions at unobserved locations. However, you can still access the random forest object fit via ranger and inspect it.

The output from splmRF(formula, data, ...) is a list that contains several elements, one of which is "ranger" holding the random forest fit. This can be accessed and used just like a call to ranger(formula, data, ...) and passed to other functions and software packages that access ranger objects. The ... argument to splmRF(formula, data, ...) allows for additional arguments to be passed to ranger(formula, data, ...) or splm(residuals, data, ...). For example, to access variable importance (using the moss data in spmodel), you could run

mod <- splmRF( log_Zn ~ log_dist2road + sideroad, data = moss, spcov_type = "exponential", importance = "impurity" )

The importance = "impurity" argument is passed to ranger(), making the relevant ranger() call

ranger( log_Zn ~ log_dist2road + sideroad, data = moss, importance = "impurity" )

(Note that within splmRF(), we implicitly remove the geometry from moss via sf::st_drop_geometry().)

You can store the ranger part of the model fit via

ranger_fit <- mod$ranger

Then you can access the variable importance via the ranger object:

ranger_fit$variable.importance

Then, for example, you could pass ranger_fit to partial() from the pdp R package https://github.com/bgreenwell/pdp for partial dependence plots.

— Reply to this email directly, view it on GitHub https://github.com/USEPA/spmodel/issues/16#issuecomment-1979368221, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFI3FBYR5BN6KZ4IDW7OI6TYWYDX5AVCNFSM6AAAAABEF77XKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGM3DQMRSGE . You are receiving this because you were mentioned.Message ID: @.***>

-- Manuel Spínola, Ph.D. Instituto Internacional en Conservación y Manejo de Vida Silvestre Universidad Nacional Apartado 1350-3000 Heredia COSTA RICA @. @.> @.*** Teléfono: (506) 8706 - 4662 Sitio web institucional: ICOMVIS http://www.icomvis.una.ac.cr/index.php/manuel Sitio web personal: Sitio personal https://mspinola-sitioweb.netlify.app Blog sobre Ciencia de Datos: Blog de Ciencia de Datos https://mspinola-ciencia-de-datos.netlify.app