cjcarlson / embarcadero

🌲🌉 Species distribution models with Bayesian additive regression trees
49 stars 11 forks source link

Saved RI model objects may not retain saved trees #29

Closed jbyoder closed 1 year ago

jbyoder commented 2 years ago

Fitted an RI model with keepTrees = TRUE, saved it, then loaded it in a fresh R session and used it for prediction: the output is all value = 0.5 for every grid cell. Suspect, based on our chat, that trees aren't being retained in the saved R object.

VirginiaMorera commented 2 years ago

Hi, this is the same thing I found out and flagged in issue #28 but much better explained, in case you want to merge these two issues or close #28 and retain only this, and is still happening as of May 2022

cjcarlson commented 1 year ago

OK! This is a known workflow issue more than an embarcadero issue.

If you go into the documentation of bart.Rd, you'll see this section, which I think maybe doesn't show up in the actual R help file:

  \subsection{Saving}{
    \code{\link{save}}ing and \code{\link{load}}ing fitted BART objects for use with \code{predict} requires that R's serialization mechanism be able to access the underlying trees, in addition to being fit with \code{keeptrees}/\code{keepTrees} as \code{TRUE}. For memory purposes, the trees are not stored as R objects unless specifically requested. To do this, one must \dQuote{touch} the sampler's state object before saving, e.g. for a fitted object \code{bartFit}, execute \code{invisible(bartFit$fit$state)}.
  }

This is a step that basically goes into your workflow, for example: https://github.com/cjcarlson/plague-wna/search?q=invisible%28model

I don't really know how to solve this on the embarcadero side, because I made a stylistic decision early on not to write zero-value-add wrappers to the most vanilla bart() and rbart_vi() implementations (felt icky!). I think maybe I should add it to the readme with a little "helpful tips" or something?

cjcarlson commented 1 year ago

Temporary README edit here https://github.com/cjcarlson/embarcadero/commit/b0e34f494123ccb7a99cffc459fd51898d528299

VirginiaMorera commented 7 months ago

Hi,

This doesn't seem to fix the issue. I've tried

mod1 <-bart.step(
  x.data = all.cov[,xvars],
  y.data = all.cov[,'pres'],
  full = TRUE,
  quiet = F)

invisible(mod1$fit$state) 
saveRDS(mod1, file = "mod1.RDS")
savedmod1 <- readRDS("mod1.RDS")

I've made sure the keeptrees option is set to true:

>mod1 
Call:
bart(x.train = train[, 2:ncol(train)], y.train = train[, 1], 
    ntree = n.trees, keeptrees = TRUE)

However, trying to predict on the saved model doesn't work:

>map <- predict(savedmod1, predictors, quiet=FALSE)
>summary(map[])
     layer      
 Min.   :0.5    
 1st Qu.:0.5    
 Median :0.5    
 Mean   :0.5    
 3rd Qu.:0.5    
 Max.   :0.5    
 NA's   :33216 

but predicting on the original works

>map2 <- predict(mod1, predictors, quiet=FALSE)
>summary(map2[])
     layer      
 Min.   :0.00   
 1st Qu.:0.05   
 Median :0.18   
 Mean   :0.29   
 3rd Qu.:0.49   
 Max.   :0.98   
 NA's   :33375  

Am I missing something?

Thanks