What goes wrong

I'm using tidymodels to build a basic ML model, I'm then using the Vetiver package to serve this model as an API endpoint on GCP using a docker container. I'm having issues with authentication the error thrown when I run docker run is that there's "No .httr-oauth file exists in current working directory. Do library authentication steps to provide credentials."

I'm confused as to what is causing the issue, when I run gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID")) I can see my bucket info leading me to think I'm authenticated.

Are there recommendations when trying to authenticate using docker?

Steps to reproduce the problem

Please note that if a reproduceable example that I can run is not available, then the likelihood of getting any bug fixed is low.

if (!require("pacman")) install.packages("pacman")

pacman::p_load( tidyverse, googleCloudRunner, skimr, tidymodels, palmerpenguins, gt, ranger, brulee, pins, vetiver, plumber, conflicted, usethis, themis, googleCloudStorageR, googleAuthR, httr, gargle, tune, finetune, doMC )

AUTHENTICATE USING THE SERVICE ACCOUNT JSON FILE REFERENCED IN THE ENVIRON FILE

googleAuthR::gar_auth_service(json_file = Sys.getenv("GCE_AUTH_FILE"))

gcs_list_buckets(projectId = Sys.getenv("GCE_DEFAULT_PROJECT_ID"))

tidymodels_conflicts()

conflict_prefer("penguins", "palmerpenguins")

PREPARE & SPLIT DATA ----------------------------------------------------

REMOVE ROWS WITH MISSING SEX, EXCLUDE YEAR AND ISLAND

penguins_df <- penguins %>% drop_na(sex) %>% select(-year, -island)

set.seed(123)

SPLIT THE DATA INTO TRAIN AND TEST SETS STRATIFIED BY SEX

penguin_split <- initial_split(penguins_df, strata = sex, prop = 3 / 4) penguin_train <- training(penguin_split) penguin_test <- testing(penguin_split)

CREATE FOLDS FOR CROSS VALIDATION

penguin_folds <- vfold_cv(penguin_train)

CREATE PREPROCESSING RECIPE ---------------------------------------------

penguin_rec <- recipe(sex ~ ., data = penguin_train) %>% step_YeoJohnson(all_numeric_predictors()) %>% themis::step_upsample(species) %>% step_dummy(species) %>% step_normalize(all_numeric_predictors())

MODEL SPECIFICATION -----------------------------------------------------

LOGISTIC REGRESSION

glm_spec <-

L1 REGULARISATION

logistic_reg(penalty = 1) %>% set_engine("glm")

RANDOM FOREST

tree_spec <- rand_forest(min_n = tune()) %>% set_engine("ranger") %>% set_mode("classification")

NEURAL NETWORK WITH TORCH

mlp_brulee_spec <- mlp( hidden_units = tune(), epochs = tune(), penalty = tune(), learn_rate = tune() ) %>% set_engine("brulee") %>% set_mode("classification")

MODEL FITTING AND HYPER PARAMETER TUNING --------------------------------

REGISTER PARALLEL CORES

registerDoMC(cores = 2)

BAYESIAN OPTIMIZATION FOR HYPER PARAMETER TUNING

bayes_control <- control_bayes( no_improve = 10L, time_limit = 20, save_pred = TRUE, verbose = TRUE )

FIT ALL THREE MODELS WITH HYPER PARAMETER TUNING

workflow_set <- workflow_set( preproc = list(penguin_rec), models = list( glm = glm_spec, tree = tree_spec, torch = mlp_brulee_spec ) ) %>% workflow_map("tune_bayes", iter = 50L, resamples = penguin_folds, control = bayes_control )

COMPARE MODEL RESULTS ---------------------------------------------------

rank_results(workflow_set, rank_metric = "roc_auc", select_best = TRUE ) %>% gt()

PLOT MODEL PERFORMANCE

workflow_set %>% autoplot()

FINALIZE MODEL FIT ------------------------------------------------------

SELECT THE LOGISTIC MODEL GIVEN THAT ITS A SIMPLER MODEL AND PERFORMANCE

IS SIMILAR TO THE NUERAL NET MODEL

best_model_id <- "recipe_glm"

SELECT BEST MODEL

best_fit <- workflow_set %>% extract_workflow_set_result(best_model_id) %>% select_best(metric = "accuracy")

CREATE WORKFLOW FOR BEST MODEL

final_workflow <- workflow_set %>% extract_workflow(best_model_id) %>% finalize_workflow(best_fit)

final_fit <- final_workflow %>% last_fit(penguin_split)

FINAL FIT METRICS

final_fit %>% collect_metrics() %>% gt()

final_fit %>% collect_predictions() %>% roc_curve(sex, .pred_female) %>% autoplot()

final_fit_to_deploy <- final_fit %>% extract_workflow()

VERSION WITH VETIVER ----------------------------------------------------

INITIALISE VETIVER MODEL OBJECT

v <- vetiver_model(final_fit_to_deploy, model_name = "logistic_regression_model" )

model_board <- board_gcs(bucket = "ml_ops_in_r_bucket")

model_board %>% vetiver_pin_write(vetiver_model = v)

My api is accessible and has response status 200

vetiver_write_plumber(model_board, "logistic_regression_model", rsconnect = FALSE)

vetiver_write_docker(v)

My docker file contains environment variable references to my json file as well as the bucket and project

Expected output

Actual output

Before you run your code, please run:

options(googleAuthR.verbose=2) and copy-paste the console output here.
Check it doesn't include any sensitive info like auth tokens or accountIds - you can usually just edit those out manually and replace with say XXX

Session Info

R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_Ireland.utf8 LC_CTYPE=English_Ireland.utf8
[3] LC_MONETARY=English_Ireland.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Ireland.utf8

time zone: Europe/Dublin tzcode source: internal

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] rapidoc_8.4.3 doMC_1.3.5
[3] iterators_1.0.14 foreach_1.5.2
[5] finetune_1.1.0 gargle_1.5.2
[7] httr_1.4.7 googleAuthR_2.0.1
[9] googleCloudStorageR_0.7.0 themis_1.0.2
[11] usethis_2.2.2 conflicted_1.2.0
[13] plumber_1.2.1 vetiver_0.2.3
[15] pins_1.2.1 brulee_0.2.0
[17] ranger_0.15.1 gt_0.9.0
[19] palmerpenguins_0.1.1 yardstick_1.2.0
[21] workflowsets_1.0.1 workflows_1.1.3
[23] tune_1.1.2 rsample_1.2.0
[25] recipes_1.0.8 parsnip_1.1.1
[27] modeldata_1.2.0 infer_1.0.4
[29] dials_1.2.0 scales_1.2.1
[31] broom_1.0.5 tidymodels_1.1.1
[33] skimr_2.1.5 googleCloudRunner_0.5.0
[35] lubridate_1.9.2 forcats_1.0.0
[37] stringr_1.5.0 dplyr_1.1.2
[39] purrr_1.0.2 readr_2.1.4
[41] tidyr_1.3.0 tibble_3.2.1
[43] ggplot2_3.4.3 tidyverse_2.0.0
[45] pacman_0.5.1

loaded via a namespace (and not attached): [1] torch_0.11.0 rstudioapi_0.15.0 jsonlite_1.8.7
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] vctrs_0.6.3 memoise_2.0.1 askpass_1.2.0
[10] base64enc_0.1-3 butcher_0.3.3 htmltools_0.5.6
[13] curl_5.0.2 sass_0.4.7 parallelly_1.36.0
[16] googlePubsubR_0.0.4 cachem_1.0.8 mime_0.12
[19] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.5-4.1
[22] R6_2.5.1 fastmap_1.1.1 future_1.33.0
[25] digest_0.6.33 colorspace_2.1-0 furrr_0.3.1
[28] ps_1.7.5 labeling_0.4.3 fansi_1.0.4
[31] timechange_0.2.0 compiler_4.3.1 bit64_4.0.5
[34] withr_2.5.0 backports_1.4.1 webutils_1.1
[37] MASS_7.3-60 lava_1.7.2.1 openssl_2.1.1
[40] rappdirs_0.3.3 tools_4.3.1 httpuv_1.6.11
[43] zip_2.3.0 future.apply_1.11.0 nnet_7.3-19
[46] glue_1.6.2 callr_3.7.3 promises_1.2.1
[49] grid_4.3.1 generics_0.1.3 gtable_0.3.4
[52] tzdb_0.4.0 class_7.3-22 data.table_1.14.8
[55] hms_1.1.3 xml2_1.3.5 utf8_1.2.3
[58] pillar_1.9.0 later_1.3.1 splines_4.3.1
[61] lhs_1.1.6 lattice_0.21-8 swagger_3.33.1
[64] survival_3.5-5 bit_4.0.5 tidyselect_1.2.0
[67] coro_1.0.3 jose_1.2.0 knitr_1.43
[70] xfun_0.40 hardhat_1.3.0 timeDate_4022.108
[73] stringi_1.7.12 DiceDesign_1.9 yaml_2.3.7
[76] codetools_0.2-19 cli_3.6.1 rpart_4.1.19
[79] bundle_0.1.0 repr_1.1.6 munsell_0.5.0
[82] processx_3.8.2 Rcpp_1.0.11 ROSE_0.0-4
[85] globals_0.16.2 ellipsis_0.3.2 gower_1.0.1
[88] assertthat_0.2.1 GPfit_1.0-8 listenv_0.9.0
[91] ipred_0.9-14 prodlim_2023.08.28 rlang_1.1.1

Please run sessionInfo() so we can check what versions of packages you have installed

MarkEdmondson1234 / googleAuthR

using googleAuthR to authenticate using docker and tidymodels with vetiver #229