Closed sigmafelix closed 7 months ago
@sigmafelix We don't have to adopt all of the tidymodels, but can where it makes sense. So this function helps rsample play well with the various S-T cross-validation methods?
@Spatiotemporal-Exposures-and-Toxicology Yes, cross-validation in tidymodels
operates on rsample
function outputs, and we get cross-validation fold indices as an integer vector from generate_cv_index
. The function above utilizes the integer vector and the original data.frame that has the number of rows the same as the integer vector to make a rset
class object.
# provided that dfcovarst is a data.frame with PM2.5 and covariates along with required fields of lon, lat, and time:
dfcovarstdt <- convert_stobj_to_stdt(dfcovarst)
dfcovarstdt$stdt$time <- as.Date(dfcovarstdt$stdt$time)
dfcovars_lblto <-
generate_cv_index(dfcovarstdt, "lblto", blocks = c(10, 10), t_fold = 60L)
dfcovarstdt_cv <-
convert_cv_index_rset(dfcovars_lblto, dfcovarstdt$stdt, "lblto")
## tidymodel specification
xgb_mod <-
parsnip::boost_tree(learn_rate = tune::tune()) |>
set_engine("xgboost", eval_metric = list("rmse", "mae")) |>
set_mode("regression")
pm25mod <- workflow() |>
add_model(xgb_mod) |>
add_formula(pm2.5 ~ .) |>
tune::tune_bayes(resamples = dfcovarstdt_cv, iter = 50) |>
fit_resamples(dfcovarstdt_cv, yardstick::metric_set(rmse, mae))
generate_cv_index()
@sigmafelix @eva0marques @mitchellmanware @dzilber @dawranadeep
I think there is a lot of value in using the R tidymodels
for all of our base and meta learners. I think we should enforce all of the models be based here so that we can keep things relatively simple. Unfortunately, @dzilber and @dawranadeep that means we probably can't have a GP base learner since I do not see that as an option.
Also, @sigmafelix @mitchellmanware - I suggest we keep things simple with the neural network and utilize the brulee
package, if even it means only implementing a feed-forward network.
With a tidymodel approach, I think we can implement these base learners with similar inputs and relatively simply:
Next, the stacks
package is a meta-learner based on penalized regression that integrates into the tidymodels
.
@michael-conway If you have the bandwidth, then it would be great to get your input on using the pins
and vetiver
packages for creating official versions and deploying our models to the NIEHS Posit Connect. This would make a the versioning in beethoven seamless if we can rely on well developed and document Posit packages.
As we want to adopt
tidymodels
interface for base learners, spatiotemporal cross-validation indices generated fromgenerate_cv_index
need to be useable inrsample
functions. Working example is as below, which will be available in my working branch soon: