Investigate integration with vetiver

MarkEdmondson1234 commented 2 years ago

https://vetiver.tidymodels.org/

juliasilge commented 2 years ago

To get appropriate versioning support, I imagine this will require rstudio/pins#572 to be implemented.

The deployment piece alone on its own doesn't necessarily require the model object to be stored as a pin.

MarkEdmondson1234 commented 2 years ago

A setup script is here:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

cr_deploy_plumber(file.path(root,"plumber"))

I changed the plumber deploiyment server.R to

pr <- plumber::plumb("api.R")
pr <- vetiver::vetiver_pr_predict()
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

The main bottleneck at the moment is getting a Docker image with pins installed since the arrow depedency is 40mins+ and counting to install, will look for a quicker method.

MarkEdmondson1234 commented 2 years ago

The arrow dependency timedout after 60mins, need a bigger build or ideally a pre-existing Docker

juliasilge commented 2 years ago

There has been some discussion of making the arrow dependency optional. You might want to check out rstudio/pins#537 and see if anything in there helps.

FWIW arrow isn't really needed for the model publishing use case.

MarkEdmondson1234 commented 2 years ago

Makes sense, yes it seemed a lot of installation for features not used. I've left a comment to see if there is a way though since it would be nice to have an arrow image available.

MarkEdmondson1234 commented 2 years ago

The docker built in about 20mins now so available at gcr.io/gcer-public/vetiver

I haven't seen modifying the actual plumber router before so made a new script file to load that in, this would be fairly boilerplate though I think:

#server.r
pr <- plumber::plumb("api.R")
v <- vetiver::vetiver_pin_read(pins::board_folder("pins"), name = "sacramento_rf")
pr <- vetiver::vetiver_pr_predict(pr, v, debug = TRUE)
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

Its built on top of the example plumber script I have so endpoints at /plot and /hello too - I think it would be nice to make a PubSub target for it.

How would vetiver work within an api.R script?

This successfully deployed with this simple Docker - I guess in real life some more dependencies or renv: lockfiles could be involved.

FROM gcr.io/gcer-public/vetiver
COPY ["./", "./"]
ENTRYPOINT ["Rscript", "server.R"]

Example endpoint live at https://vetiver-ewjogewawq-ew.a.run.app/predict. This is on Cloud Run serverless, can take 80 connections per instance, scales up to millions.

Runs the example from the vetiver docs:

data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>% 
   slice_sample(n = 20) %>% 
   select(type, sqft, beds, baths)

endpoint <- vetiver::vetiver_endpoint("https://vetiver-ewjogewawq-ew.a.run.app/predict")
predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

In real life you could also add a build trigger for any changes to the R script the model is doing, to update the deployment as needed. With the pins integration calling outside services such as GCS, this would be needed less often.

The full setup script below:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

# use the vetiver docker image built above to deploy a Cloud Run instance of the model
# deploys folder with api.R, Dockerfile, pins/ and server.R contained
run <- cr_deploy_plumber(file.path(root,"plumber"), remote = "vetiver")

# on succesful deployment
endpoint <- vetiver::vetiver_endpoint(paste0(run$status$url, "/predict"))
library(tidyverse)
data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>%
  slice_sample(n = 20) %>%
  select(type, sqft, beds, baths)

predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

MarkEdmondson1234 commented 2 years ago

Folder structure of working deployment here https://github.com/MarkEdmondson1234/googleCloudRunner/tree/vetiver/inst/vetiver

juliasilge commented 2 years ago

I've been working lately on generating Docker containers more, if you'd like to take a look and give any feedback. This demo might be helpful for how I am setting things up.

MarkEdmondson1234 commented 2 years ago

Thanks very much will take a look

MarkEdmondson1234 / googleCloudRunner

Investigate integration with vetiver #163