MarkEdmondson1234 / googleCloudRunner

Easy R scripts on Google Cloud Platform via Cloud Run, Cloud Build and Cloud Scheduler
https://code.markedmondson.me/googleCloudRunner/
Other
81 stars 26 forks source link

Using $ in Cloud build R code causes errors due to mis-identified substitution variable #103

Closed yfarjoun closed 3 years ago

yfarjoun commented 3 years ago

My script, which contains reference to a dataframe column State seems to break the cloud scheduler:

a=data.frame(State=c("California","Alaska"), number=c(1,2))
a$State 

fails to run, while if I use a different column name, $tate, it works:

a=data.frame(tate=c("California","Alaska"), number=c(1,2))
a$tate 

I'm guessing the google is doing using $State as an internal variable of sorts and this causing problems...I'm wondering if it would be better if you encoded the text (e.g. using a Hex encoder) and then decode it on the way out. that way you will not need to worry about escaping quotes and backslashes etc. or will you have to worry about magic variables like $State....

MarkEdmondson1234 commented 3 years ago

Errr weird, it shouldn't care what your script has. Could I see the logs of what code you used to schedule the script? With options(googleAuthR.verbose = 2)

MarkEdmondson1234 commented 3 years ago

Could I see your buildsteps? Would the below test this bug? I agree it could be some kind of yaml parsing error

library(googleCloudRunner)
Setting scopes to https://www.googleapis.com/auth/cloud-platform
Successfully auto-authenticated via /Users/xxx/googlecloudrunner-auth-key.json
> bs <- cr_buildstep_r("print(data.frame(State=c('California','Alaska'), number = c(1,2)")
> bs
[[1]]
==cloudRunnerBuildStep==
name: rocker/r-base
args:
- Rscript
- -e
- print(data.frame(State=c('California','Alaska'), number = c(1,2)
yfarjoun commented 3 years ago

no, I had to try to access the field, with $State in the end it would though:

the minimal example that I tested is:

data.frame(state=c('California'))$state

succeeds, while

data.frame(State=c('California'))$State

Fails. The only difference being the capitalization of the column state....

yfarjoun commented 3 years ago

steps to reproduce:

  1. make file containing the line above,
  2. in studio select "Addins"->Deploy onto gcp
  3. select file in browser, and click done
  4. go to google scheduler and click "run now"
  5. if capitalization == "State" -> fail.
yfarjoun commented 3 years ago
> cr_buildstep_r("data.frame(State=c('California'))$State")
[[1]]
==cloudRunnerBuildStep==
name: rocker/r-base
args:
- Rscript
- -e
- data.frame(State=c('California'))$State
MarkEdmondson1234 commented 3 years ago

Thanks! turns out its not just the scheduler, this code will fail in a direct build:

cr_build(cr_build_yaml(cr_buildstep_r("data.frame(State=c('California'))$State")))
# Error: API returned: generic::invalid_argument: invalid value for 'build.substitutions': key in the template "S" is not a valid built-in substitution

Not sure why lower case does work.

It thinks the $ is denoting a substitution variable ( https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values ) - I need to replace $ with $$

This works:

cr_build(cr_build_yaml(cr_buildstep_r("data.frame(State=c('California'))$$State")))
MarkEdmondson1234 commented 3 years ago

Should be fixed now in latest github version at least this one works:

cr_build(cr_build_yaml(cr_buildstep_r("data.frame(State=c('California'))$State")))
yfarjoun commented 3 years ago

🎉 thanks for taking this on so swiftly!

MarkEdmondson1234 commented 3 years ago

Thanks for finding it! Can you see if it fixes your issue?

yfarjoun commented 3 years ago

It does!!

(now I need to figure out how to have a docker that contains plyr and plotly...but I should manage!)