MarkEdmondson1234 / googleCloudRunner

Easy R scripts on Google Cloud Platform via Cloud Run, Cloud Build and Cloud Scheduler
https://code.markedmondson.me/googleCloudRunner/
Other
80 stars 25 forks source link

Global Environment Variables in R #169

Closed NerdyBaseballDude closed 2 years ago

NerdyBaseballDude commented 2 years ago

Hello,

Thank you for developing this awesome package. This is perhaps more of a docker question, but I wasn't able to find a solution anywhere. I'm deploying in R and I want to have a couple elements of my global environment deployed whenever I run cr_deploy_run(). For example, I have a couple models that I've trained that I want only to use the predict() function from the caret package on. However, I have to re-train the models every single time I hit the API which is very time consuming. I was curious if there was a way that I could deploy the global environment whenever I deploy to google cloud?

Thanks

MarkEdmondson1234 commented 2 years ago

Thanks :)

This is actually what vetiver does for you if you are using tidymodels - see https://github.com/MarkEdmondson1234/googleCloudRunner/issues/163

On a more generic level I would not include the training of the model in your API responses, it will be too slow as you say - but you can save the model as an object (say an .RDS file, or vetiver uses the pins package) and then include that in your deployment with only the predict method using it and responding to the new data sent to the API endpoint. I think thats what you are getting at when you talk about global environment variables?

There are a few ways to do this, I think most practical is you could save the model environment via saveRDS(), upload it to Google Cloud Storage then at the top of your plumber script load it in. In your Build you could modify it to include a build step that loads in the model object from cloud storage so its available - say to /workspace/model.rds and then in your code load it in via model <- readRDS("/workspace/model.rds") and predict via predict(model, new_data)