BlasBenito / spatialRF

R package to fit spatial models with Random Forest
https://blasbenito.github.io/spatialRF/
109 stars 16 forks source link

interoperability with {randomForestExplainer} by changing `my_forest$call` to be more {ranger}-like? #10

Open mikoontz opened 2 years ago

mikoontz commented 2 years ago

Hello again! I hope it's okay to keep documenting my experience working with your excellent package!

This is not a high priority item, but one thing I've come across in teasing out some inference from my {ranger} model built using {spatialRF} is how the resulting object can work with the {randomForestExplainer} package which is meant to operate on {ranger} objects (https://github.com/ModelOriented/randomForestExplainer).

I ran into an error when trying to learn more about non-multiplicative interactions in my random forest trying to follow this tutorial (https://cran.rstudio.com/web/packages/randomForestExplainer/vignettes/randomForestExplainer.html#variable-interactions).

The troublesome function from {randomForestExplainer} is plot_predict_interaction() and I get this error:

Error in if (as.character(forest$call[[2]])[3] == ".") { : missing value where TRUE/FALSE needed

I'll swap in the mtcars dataset for iris in the help file example for plot_predict_interaction(), which now looks like:

forest_ranger <- ranger::ranger(cyl ~ ., data = mtcars)
randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")

The $call component of the random forest generated by {ranger} for this model looks like:

> forest_ranger$call
ranger::ranger(cyl ~ ., data = mtcars)

Using {spatialRF} to build the ranger model looks like this (sort of a silly example, since the data aren't spatial, but I'll use the non-spatial approach following your very helpful tutorial):

forest_ranger <- spatialRF::rf(dependent.variable.name = "cyl", 
                               predictor.variable.names = c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), 
                               data = mtcars)

Which then gives the error above when trying to call randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp").

The $call component of the random forest generated by {spatialRF} looks like:

> forest_ranger$call
ranger::ranger(data = data, dependent.variable.name = dependent.variable.name, 
    num.trees = num.trees, mtry = mtry, importance = importance, 
    write.forest = write.forest, probability = probability, min.node.size = min.node.size, 
    max.depth = max.depth, replace = replace, sample.fraction = sample.fraction, 
    case.weights = case.weights, class.weights = class.weights, 
    splitrule = splitrule, num.random.splits = num.random.splits, 
    alpha = alpha, minprop = minprop, split.select.weights = split.select.weights, 
    always.split.variables = always.split.variables, respect.unordered.factors = respect.unordered.factors, 
    scale.permutation.importance = scale.permutation.importance, 
    local.importance = local.importance, regularization.factor = regularization.factor, 
    regularization.usedepth = regularization.usedepth, keep.inbag = keep.inbag, 
    inbag = inbag, holdout = holdout, quantreg = quantreg, oob.error = oob.error, 
    num.threads = num.threads, save.memory = save.memory, verbose = verbose, 
    seed = seed, classification = classification)

Pretty different!

The line in {randomForestExplainer} that causes this is here: https://github.com/ModelOriented/randomForestExplainer/blob/630c4fe9f7ddcc0a9a586dc4c4fc1822e9d30776/R/min_depth_interactions.R#L363

I was able to work around this by overwriting the $call component in the random forest generated by {spatialRF} like so:

forest_ranger$call <- str2lang(paste0("ranger::ranger(cyl ~ ", paste(c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), collapse = " + "), ")"))

For posterity, this essentially creates the formula for the random forest model by putting together the dependent and independent variable pieces. To make it work with {randomForestExplainer}, you also have to include the package name and function call (ranger::ranger()) wrapped around the formula. The str2lang() function was suggested by R as the right way to create an object of class call after I tried wrapping the character string in just as.call() which didn't work.

Which then lets me run randomForestExplainer::plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp") to produce:

image

So anyway, I'm interested in your opinion about this. Is the approach by {randomForestExplainer} to get the names of the independent variables too fragile (and I can make an issue on their package)? Or is this issue better served by changing how {spatialRF} stores the call as a component of the {ranger} object? Maybe both?

mikoontz commented 2 years ago

One more thing I should note is that my workaround helps fine for the non-spatial version, but perhaps the spatial version would need additional coaxing if there are also spatial predictors involved?

mikoontz commented 2 years ago

I made an issue in {randomForestExplainer} also, as the issue also comes up even if you use plain old {ranger} but don't specify the model using the formula syntax. I suggested using the $forest$independent.variable.names component of the {ranger} model to get the independent variables directly. I suspect that would help with the spatial version of the {ranger} model derived from {spatialRF} if spatial predictors were added (presumably they'd show up in the $forest$independent.variable.names component even if they aren't specified by the user in the predictor.variable.names= argument to the spatialRF::rf() %>% spatialRF::spatial_rf() call?

BlasBenito commented 2 years ago

Hello!

Thank you for providing such comprehensive feedback on this topic. I truly appreciate it.

I am highly interested in making spatialRF as compatible as possible with other packages, and the issue you raised is timely because I am working on the new version of the package, so this is the time to make the required modifications to make the functions of spatialRF compatible with randomForestExplainer.

However, please give me a few days to get back to you. I am a bit flooded with work now.

Cheers,

Blas

mikoontz commented 2 years ago

No rush! Glad that I can be of some help in documenting my use of the package. It does seem like this is more of an issue with how {randomForestExplainer} works, rather than how {spatialRF} works, but maybe there's still some way to make them all play nicely if/until {randomForestExplainer} changes their approach.