MarkEdmondson1234 / googleCloudRunner

Easy R scripts on Google Cloud Platform via Cloud Run, Cloud Build and Cloud Scheduler
https://code.markedmondson.me/googleCloudRunner/
Other
81 stars 26 forks source link

How to set scopes/role for Google Analytics? #115

Closed mta614 closed 2 years ago

mta614 commented 3 years ago

I'm not totally sure if this is the right place for this question, but I'm in the process of taking an ETL I built in R and making it so it can run automatically on Google Cloud. So far, googleCloudRunner seems very promising for achieving this goal, but I have hit a bit of a snag:

My ETL needs to both use GCS via googleCloudStorageR and GA via googleAnalyticsR. Using the setup tutorial, granting the service account I created via cr_setup() was pretty trivial, although I am getting this minor issue:

Error: API returned: Cannot insert legacy ACL for an object when uniform bucket-level access is enabled. Read more at https://cloud.google.com/storage/docs/uniform-bucket-level-access

I think newer versions of the above packages on github may solve that problem but I haven't looked too deeply into it yet.

A much larger problem is attempting to query data in GA. Previously, I was using a method like this to set scopes:

scopes = c("https://www.googleapis.com/auth/analytics.readonly",
           "https://www.googleapis.com/auth/devstorage.read_write")

googleAuthR::gar_set_client(scopes = scopes)

which was working great. However, I'm a bit at a loss for how to do the same with the service account I've setup. As far as I can tell, there isn't a way to set scopes with functions. I also went into IAM to see if there was a role that mapped to https://www.googleapis.com/auth/analytics.readonly but I haven't been able to find anything.

I feel like I'm missing something fundamental here because I imagine querying data via GA is a super common use case. What am I missing?

MarkEdmondson1234 commented 3 years ago

There is an example in the polyglot demo downloading GA with a Go library, that may help you out to see the structure. Since each step can run in its own environment you can mix and match.

You need a docker environment with googleAnalyticsR built with that library you could use. I would recommended creating a dedicated service account with only read only access to the GA account you want to download from, then upload that to Secret Manager. In the build step before your script download that service key file and point to it in the environment argument GA_AUTH. Then you can run your script using the key as you do locally.

mta614 commented 3 years ago

Hmm this may be a naive question (I'm pretty new to Cloud), but my issue is that I'm not able to create a service account with read only access to GA (or any any kind of access, actually). It simply doesn't appear to be an option in Google Cloud when adjusting roles (unlike, say, setting a Storage Admin role which allows for writing/reading to/from GCS buckets).

MarkEdmondson1234 commented 3 years ago

You don't need to assign it any roles. You will be doing that in effect when you add it's email as a user to GA.

mta614 commented 3 years ago

So I managed this (as you said, it was a matter of adding the service account to GA) and using Arben Kqiku's guide you recommended to get my pipeline running on Google Cloud.

One thing that I am currently doing that is terrible practice: I have my secret json in my Docker container, although I think I should be referencing them via Secret Manager. Or possible that they should be referenced in my script via Secret Manager AND accounted for when I make my Cloud build; I'm definitely unclear on exactly what's best to do.

No chance there's an example or guide to illustrate how to do this?

MarkEdmondson1234 commented 3 years ago

The polyglot use case make use of secret manager for the auth key that downloads it to the build workspace that is better but not perfect, ideal would be a call from R itself that I'll work on in the future.