I'm using tidymodels to build a basic ML model, I'm then using the Vetiver package to serve this model as an API endpoint on GCP using a docker container. I'm having issues with authentication the error thrown when I run docker run is that there's "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials."
I'm confused as to what is causing the issue, when I run gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID")) I can see my bucket info leading me to think I'm authenticated.
Are there recommendations when trying to authenticate using docker?
Steps to reproduce the problem
Please note that if a reproduceable example that I can run is not available, then the likelihood of getting any bug fixed is low.
if (!require("pacman")) install.packages("pacman")
My docker file contains environment variable references to my json file as well as the bucket and project
Expected output
Actual output
Before you run your code, please run:
options(googleAuthR.verbose=2) and copy-paste the console output here.
Check it doesn't include any sensitive info like auth tokens or accountIds - you can usually just edit those out manually and replace with say XXX
Session Info
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
What goes wrong
I'm using tidymodels to build a basic ML model, I'm then using the Vetiver package to serve this model as an API endpoint on GCP using a docker container. I'm having issues with authentication the error thrown when I run docker run is that there's "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials."
I'm confused as to what is causing the issue, when I run gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID")) I can see my bucket info leading me to think I'm authenticated.
Are there recommendations when trying to authenticate using docker?
Steps to reproduce the problem
Please note that if a reproduceable example that I can run is not available, then the likelihood of getting any bug fixed is low.
if (!require("pacman")) install.packages("pacman")
pacman::p_load( tidyverse, googleCloudRunner, skimr, tidymodels, palmerpenguins, gt, ranger, brulee, pins, vetiver, plumber, conflicted, usethis, themis, googleCloudStorageR, googleAuthR, httr, gargle, tune, finetune, doMC )
AUTHENTICATE USING THE SERVICE ACCOUNT JSON FILE REFERENCED IN THE ENVIRON FILE
googleAuthR::gar_auth_service(json_file = Sys.getenv("GCE_AUTH_FILE"))
gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID"))
tidymodels_conflicts()
conflict_prefer("penguins", "palmerpenguins")
PREPARE & SPLIT DATA ----------------------------------------------------
REMOVE ROWS WITH MISSING SEX, EXCLUDE YEAR AND ISLAND
penguins_df <- penguins %>% drop_na(sex) %>% select(-year, -island)
set.seed(123)
SPLIT THE DATA INTO TRAIN AND TEST SETS STRATIFIED BY SEX
penguin_split <- initial_split(penguins_df, strata = sex, prop = 3 / 4) penguin_train <- training(penguin_split) penguin_test <- testing(penguin_split)
CREATE FOLDS FOR CROSS VALIDATION
penguin_folds <- vfold_cv(penguin_train)
CREATE PREPROCESSING RECIPE ---------------------------------------------
penguin_rec <- recipe(sex ~ ., data = penguin_train) %>% step_YeoJohnson(all_numeric_predictors()) %>% themis::step_upsample(species) %>% step_dummy(species) %>% step_normalize(all_numeric_predictors())
MODEL SPECIFICATION -----------------------------------------------------
LOGISTIC REGRESSION
glm_spec <-
L1 REGULARISATION
logistic_reg(penalty = 1) %>% set_engine("glm")
RANDOM FOREST
tree_spec <- rand_forest(min_n = tune()) %>% set_engine("ranger") %>% set_mode("classification")
NEURAL NETWORK WITH TORCH
mlp_brulee_spec <- mlp( hidden_units = tune(), epochs = tune(), penalty = tune(), learn_rate = tune() ) %>% set_engine("brulee") %>% set_mode("classification")
MODEL FITTING AND HYPER PARAMETER TUNING --------------------------------
REGISTER PARALLEL CORES
registerDoMC(cores = 2)
BAYESIAN OPTIMIZATION FOR HYPER PARAMETER TUNING
bayes_control <- control_bayes( no_improve = 10L, time_limit = 20, save_pred = TRUE, verbose = TRUE )
FIT ALL THREE MODELS WITH HYPER PARAMETER TUNING
workflow_set <- workflow_set( preproc = list(penguin_rec), models = list( glm = glm_spec, tree = tree_spec, torch = mlp_brulee_spec ) ) %>% workflow_map("tune_bayes", iter = 50L, resamples = penguin_folds, control = bayes_control )
COMPARE MODEL RESULTS ---------------------------------------------------
rank_results(workflow_set, rank_metric = "roc_auc", select_best = TRUE ) %>% gt()
PLOT MODEL PERFORMANCE
workflow_set %>% autoplot()
FINALIZE MODEL FIT ------------------------------------------------------
SELECT THE LOGISTIC MODEL GIVEN THAT ITS A SIMPLER MODEL AND PERFORMANCE
IS SIMILAR TO THE NUERAL NET MODEL
best_model_id <- "recipe_glm"
SELECT BEST MODEL
best_fit <- workflow_set %>% extract_workflow_set_result(best_model_id) %>% select_best(metric = "accuracy")
CREATE WORKFLOW FOR BEST MODEL
final_workflow <- workflow_set %>% extract_workflow(best_model_id) %>% finalize_workflow(best_fit)
final_fit <- final_workflow %>% last_fit(penguin_split)
FINAL FIT METRICS
final_fit %>% collect_metrics() %>% gt()
final_fit %>% collect_predictions() %>% roc_curve(sex, .pred_female) %>% autoplot()
final_fit_to_deploy <- final_fit %>% extract_workflow()
VERSION WITH VETIVER ----------------------------------------------------
INITIALISE VETIVER MODEL OBJECT
v <- vetiver_model(final_fit_to_deploy, model_name = "logistic_regression_model" )
v
model_board <- board_gcs(bucket = "ml_ops_in_r_bucket")
model_board %>% vetiver_pin_write(vetiver_model = v)
My api is accessible and has response status 200
vetiver_write_plumber(model_board, "logistic_regression_model", rsconnect = FALSE)
vetiver_write_docker(v)
My docker file contains environment variable references to my json file as well as the bucket and project
Expected output
Actual output
Before you run your code, please run:
options(googleAuthR.verbose=2)
and copy-paste the console output here.Check it doesn't include any sensitive info like auth tokens or accountIds - you can usually just edit those out manually and replace with say
XXX
Session Info
R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale: [1] LC_COLLATE=English_Ireland.utf8 LC_CTYPE=English_Ireland.utf8
[3] LC_MONETARY=English_Ireland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.utf8
time zone: Europe/Dublin tzcode source: internal
attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages: [1] rapidoc_8.4.3 doMC_1.3.5
[3] iterators_1.0.14 foreach_1.5.2
[5] finetune_1.1.0 gargle_1.5.2
[7] httr_1.4.7 googleAuthR_2.0.1
[9] googleCloudStorageR_0.7.0 themis_1.0.2
[11] usethis_2.2.2 conflicted_1.2.0
[13] plumber_1.2.1 vetiver_0.2.3
[15] pins_1.2.1 brulee_0.2.0
[17] ranger_0.15.1 gt_0.9.0
[19] palmerpenguins_0.1.1 yardstick_1.2.0
[21] workflowsets_1.0.1 workflows_1.1.3
[23] tune_1.1.2 rsample_1.2.0
[25] recipes_1.0.8 parsnip_1.1.1
[27] modeldata_1.2.0 infer_1.0.4
[29] dials_1.2.0 scales_1.2.1
[31] broom_1.0.5 tidymodels_1.1.1
[33] skimr_2.1.5 googleCloudRunner_0.5.0
[35] lubridate_1.9.2 forcats_1.0.0
[37] stringr_1.5.0 dplyr_1.1.2
[39] purrr_1.0.2 readr_2.1.4
[41] tidyr_1.3.0 tibble_3.2.1
[43] ggplot2_3.4.3 tidyverse_2.0.0
[45] pacman_0.5.1
loaded via a namespace (and not attached): [1] torch_0.11.0 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] vctrs_0.6.3 memoise_2.0.1 askpass_1.2.0
[10] base64enc_0.1-3 butcher_0.3.3 htmltools_0.5.6
[13] curl_5.0.2 sass_0.4.7 parallelly_1.36.0
[16] googlePubsubR_0.0.4 cachem_1.0.8 mime_0.12
[19] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.5-4.1
[22] R6_2.5.1 fastmap_1.1.1 future_1.33.0
[25] digest_0.6.33 colorspace_2.1-0 furrr_0.3.1
[28] ps_1.7.5 labeling_0.4.3 fansi_1.0.4
[31] timechange_0.2.0 compiler_4.3.1 bit64_4.0.5
[34] withr_2.5.0 backports_1.4.1 webutils_1.1
[37] MASS_7.3-60 lava_1.7.2.1 openssl_2.1.1
[40] rappdirs_0.3.3 tools_4.3.1 httpuv_1.6.11
[43] zip_2.3.0 future.apply_1.11.0 nnet_7.3-19
[46] glue_1.6.2 callr_3.7.3 promises_1.2.1
[49] grid_4.3.1 generics_0.1.3 gtable_0.3.4
[52] tzdb_0.4.0 class_7.3-22 data.table_1.14.8
[55] hms_1.1.3 xml2_1.3.5 utf8_1.2.3
[58] pillar_1.9.0 later_1.3.1 splines_4.3.1
[61] lhs_1.1.6 lattice_0.21-8 swagger_3.33.1
[64] survival_3.5-5 bit_4.0.5 tidyselect_1.2.0
[67] coro_1.0.3 jose_1.2.0 knitr_1.43
[70] xfun_0.40 hardhat_1.3.0 timeDate_4022.108
[73] stringi_1.7.12 DiceDesign_1.9 yaml_2.3.7
[76] codetools_0.2-19 cli_3.6.1 rpart_4.1.19
[79] bundle_0.1.0 repr_1.1.6 munsell_0.5.0
[82] processx_3.8.2 Rcpp_1.0.11 ROSE_0.0-4
[85] globals_0.16.2 ellipsis_0.3.2 gower_1.0.1
[88] assertthat_0.2.1 GPfit_1.0-8 listenv_0.9.0
[91] ipred_0.9-14 prodlim_2023.08.28 rlang_1.1.1
Please run
sessionInfo()
so we can check what versions of packages you have installed