MarkEdmondson1234 / googleCloudRunner

Easy R scripts on Google Cloud Platform via Cloud Run, Cloud Build and Cloud Scheduler
https://code.markedmondson.me/googleCloudRunner/
Other
81 stars 26 forks source link

Dockerfile can't be found in Cloud Build #110

Closed jstrome-lmp closed 3 years ago

jstrome-lmp commented 3 years ago

Hey there!

I am trying to create my first cloud build using googlecloudrunner! My ultimate goal is scheduling an r script to run using gmailr to do some data manipulation with an email attachment. I have a decent amount of R experience, but have no experience with cloud build or docker containers, so I apologize if this question is very basic.

I have been receiving this error when trying to run on cloud build: unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /workspace/Dockerfile: no such file or directory

Here is my code: `storage <- cr_build_upload_gcs("filepath")

dv_360_yaml <- cr_build_yaml( steps = c( cr_buildstep_docker("gcr.io/my-project/docker"), cr_buildstep_r("gs://filepath.R", name = "r-base", r_source = "runtime", prefix = "rocker/") ), images = "gcr.io/my-project/docker" )

cr_build(dv_360_yaml, source = storage)`

In my "filepath" for cr_build_upload_gcs, I have a file containing R code, .json files, and a Dockerfile I copied from rocker.

I have been reading the documentation for all the commands and do not understand what I am supposed to do (I've spent quite a few hours on this over the last week) Do you have any full examples you could guide me towards? Or any resources I could read up on to understand what I am doing a little better? Thank you for your time!

MarkEdmondson1234 commented 3 years ago

Make sure you can run all the tests in cr_setup_tests() first.

But looking at the error, it is expecting a Dockerfile in your build, but its not finding it.

Do you need to build a Dockerfile or will rocker/r-base run your code?

I think with the cloud storage object you have it will download into the workspace at /workspace/filepath/ so you may need to set your buildsteps to run in dir="filepath" so it can find the files you expect. You can confirm this by examining the build logs and/or add a build step to list the files in /workspace/ (I often to this via cr_buildstep_r("list.files(recursive=TRUE)" to check what the working directory holds)

MarkEdmondson1234 commented 3 years ago

Ok its in the /workspace/deploy/ folder as thats the default for cr_build_upload_gcs() - do if you add the dir="deploy" argument to the buildsteps it should find the files you are uploading.

MarkEdmondson1234 commented 3 years ago

I added some messaging to make this more clear and in the docs:

storage <- cr_build_upload_gcs("a_folder")

#── #Upload  vignettes  to  gs://my-bucket-name/a_folder20210220224720.tar.gz ───────
#ℹ 2021-02-20 22:47:20 > Uploading a_folder.tar.gz to mark-edmondson-public-files/vignettes20210220224720.tar.gz
#2021-02-20 22:47:20 -- File size detected as 3.5 Mb
#ℹ 2021-02-20 22:47:21 > When used in builds files will be available in folder: /workspace/deploy
#ℹ 2021-02-20 22:47:21 > e.g. Use cr_buildstep_r('list.files()' dir='deploy')
MarkEdmondson1234 commented 3 years ago

Also if you want your R code to run within the Docker created the step before, you will need to specify that Docker location rather than r-base so:


storage <- cr_build_upload_gcs("filepath")

dv_360_yaml <- cr_build_yaml(
steps = c(
  cr_buildstep_docker("gcr.io/my-project/docker", dir = "deploy"),
  cr_buildstep_r("gs://filepath.R", 
                 name = "gcr.io/my-project/docker", 
                 r_source = "runtime", 
                 dir = "deploy")),
  images = "gcr.io/my-project/docker"
)

cr_build(dv_360_yaml, source = storage)`
jstrome-lmp commented 3 years ago

Thank you for the reply and helpful information!

cr_setup_test() does run properly and work, however Build is still giving me an error:

starting build "27c4ed58-7aab-44d4-961d-b00b955420a7" FETCHSOURCE Fetching storage object: gs://my-bucket/my-folder.tar.gz#1613948133520136 Copying gs://my-bucket/my-folder.tar.gz#1613948133520136... / [0 files][ 0.0 B/ 1.3 KiB]
/ [1 files][ 1.3 KiB/ 1.3 KiB]
Operation completed over 1 objects/1.3 KiB.
BUILD Starting Step #0 Step #0: Already have image (with digest): gcr.io/cloud-builders/docker Step #0: unable to prepare context: unable to evaluate symlinks in Dockerfile path: lstat /workspace/deploy/Dockerfile: no such file or directory Finished Step #0 ERROR ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: `

When I run the setup, it creates the gcr.io/my-project/example, but my code is not creating gccr.io/my-project/docker. Is there a step I missed that creates the registry?

I also double checked, and my Dockerfile is indeed named "Dockerfile" with no extension.

Here is what my code looks like now:

storage <- cr_build_upload_gcs("my-folder")

dv_360_yaml <- cr_build_yaml(
  steps = c(
   cr_buildstep_docker("gcr.io/my-project/docker", dir = "deploy"),
   cr_buildstep_r("gs://my-bucket/my-folder/my-file.R",
                   name = "gcr.io/my-project/docker",
                   r_source = "runtime",
                   dir = "deploy")
  ),
  images = "gcr.io/able-air-305119/docker"
)
cr_build(dv_360_yaml, source = storage)

Thank you again for the help I really appreciate it!

MarkEdmondson1234 commented 3 years ago

Its still not finding your Dockerfile - could you put in a buildstep which lists the files in "/workspace/deploy/" and in "/workspace/" ? cr_buildstep_r('list.files()', dir='deploy') will do it.

And may I also check your sessionInfo()

jstrome-lmp commented 3 years ago

Ah I see, okay - I made a new build:

dv_360_yaml <- cr_build_yaml(
  steps =c(cr_buildstep_r('list.files()', dir='deploy'))
)

cr_build(dv_360_yaml, source = storage)

This returned "my-folder". I then changed the code to say dir = "deploy/my-folder"

Now with this change, it showed the files I had in my folder, so I ran my original code with the directory. My build ran for longer than it ever has (progress!!!) however, I am now receiving this error after it appears R is finished building / installing packages: Just realized it was because the build is timing out, trying again

Here is my session info as requested:

R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shiny_1.6.0             miniUI_0.1.1.1          stringr_1.4.0           gmailr_1.0.0           
[5] googleCloudRunner_0.4.1

loaded via a namespace (and not attached):
 [1] zip_2.1.1                 Rcpp_1.0.6                jquerylib_0.1.3           bslib_0.2.4              
 [5] compiler_4.0.3            pillar_1.4.7              later_1.1.0.1             googleAuthR_1.3.1        
 [9] prettyunits_1.1.1         progress_1.2.2            base64enc_0.1-3           tools_4.0.3              
[13] digest_0.6.27             jsonlite_1.7.2            memoise_2.0.0             lifecycle_1.0.0          
[17] gargle_0.5.0              tibble_3.0.6              pkgconfig_2.0.3           rlang_0.4.10             
[21] rstudioapi_0.13           cli_2.3.0                 curl_4.3                  yaml_2.2.1               
[25] xfun_0.20                 fastmap_1.1.0             swagger_3.33.1            httr_1.4.2               
[29] googleCloudStorageR_0.6.0 sass_0.3.1                hms_1.0.0                 fs_1.5.0                 
[33] vctrs_0.3.6               askpass_1.1               glue_1.4.2                R6_2.5.0                 
[37] plumber_1.0.0             magrittr_2.0.1            rematch2_2.1.2            htmltools_0.5.1.1        
[41] webutils_1.1              promises_1.2.0.1          ellipsis_0.3.1            assertthat_0.2.1         
[45] xtable_1.8-4              mime_0.9                  jose_1.0                  httpuv_1.5.5             
[49] tinytex_0.29              stringi_1.5.3             openssl_1.4.3             cachem_1.0.3             
[53] crayon_1.4.1    

I have also attached my dockerfile:

FROM rocker/rstudio:3.6.3

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
  libxml2-dev \
  libcairo2-dev \
  libsqlite-dev \
  libmariadbd-dev \
  libmariadbclient-dev \
  libpq-dev \
  libssh2-1-dev \
  unixodbc-dev \
  libsasl2-dev \
  && install2.r --error \
    --deps TRUE \
    tidyverse \
    dplyr \
    devtools \
    formatR \
    remotes \
    selectr \
    caTools \
    gmailr \
    BiocManager

Thank you!

MarkEdmondson1234 commented 3 years ago

If its a timeout for your Dockerbuild try using the kaniko cache option that will speed up builds.

The Dockerfile does look heavy with tidyverse included so a timeout would be expected, you probably would be better off using the rocker/verse image that comes with tidyverse already installed. The rocker/rstudio one is not necessary as you wont be running RStudio for the build, just R. If possible only include the packages you need to run your code.

jstrome-lmp commented 3 years ago

Mark, thank you for your time and help here. Using the rocker/verse image is exactly what I needed to do. Now I just have to work through getting my r code to work! I really appreciate it!