GoogleCloudPlatform / healthcare-data-harmonization

This is an engine that converts data of one structure to another, based on a configuration file which describes how. There is an accompanying syntax to make writing mappings easier and more robust.
https://cloud.google.com/solutions/healthcare-life-sciences
Apache License 2.0
214 stars 66 forks source link

Unable to run example notebook, possible permission issue with https://storage.googleapis.com/storage/v1/b/data-harmonization-sample-data #32

Closed torstees closed 2 years ago

torstees commented 3 years ago

Very possibly user error, but my service account does have the Storage Admin role, yet I'm getting the following error:

Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/data-harmonization-sample-data?fields=name: does not have storage.buckets.get access to the Google Cloud Storage bucket.

I'm not sure how I can possibly grant my own service rights to some other group's bucket, but, I'm pretty green regarding cloud stuff and may have no idea what the underlying call really is trying to do.

vneilley commented 3 years ago

I believe your service account that you are running the notebook from does not have Google Cloud Storage access. Did you create a service account - as specified under the "running the image instructions" Under IAM tab within GCP you are able to grant access as "cloud storage admin" to the notebook service account? Also - where are you running the notebook? Vertex user-managed notebooks have a different set of permissions: https://cloud.google.com/vertex-ai/docs/workbench/user-managed/iam

torstees commented 3 years ago

So, when I'm inside console for the relevant project and select IAM and then edit the service account, I have two roles: Healthcare HL7v2 Message editor and Storage Admin.

The notebook resides on my desktop running via WSL2 + Docker on windows. Maybe that's the problem?

vneilley commented 3 years ago

Yep! You could move the notebook to reside in the notebooks section of Vertex AI or follow something similar to https://cloudacademy.com/course/building-and-testing-applications-on-google-cloud-platform/emulating-gcp-services-for-local-application-development/

vneilley commented 3 years ago

Make sure you are passing the access token as part of the API call

torstees commented 3 years ago

I'm curious as to why that setup isn't working. I have used service accounts to interact with google's FHIR rest API just fine from my local dev machine via python. But, I don't need access to any of regular storage, though.

torstees commented 3 years ago

Make sure you are passing the access token as part of the API call

I'm not doing anything. I'm just hitting play on the notebook. The docker .env file does point to the json key file via the two vars and they are correctly configured.

jasonklotzer commented 3 years ago

Double checking that your container is being mapped to the JSON service account credentials properly. Have you defined all the env vars (NOTEBOOK_WORKING_DIR, etc) that are outlined in the compose file? https://github.com/GoogleCloudPlatform/healthcare-data-harmonization/blob/master/docker-compose.yaml#L27

The instructions to define the env vars are here: https://github.com/GoogleCloudPlatform/healthcare-data-harmonization/tree/master/tools/notebook#running-the-image

torstees commented 2 years ago

I'm pretty sure I followed whatever directions were provided by the notebook repo closely, but I may have missed something. In the end, I just figured out whistle with out the notebook's functionality. Took a bit of digging into what sort of magic was going on in the background of some of the notebook example calls, but I learned what I needed. Closing the issue since it doesn't seem to be relevant to anyone.

BusiPlay commented 2 years ago

I am having the same issue - the bucket referenced in the instructions does not appear to be public, nor is it in my project: curl -H "Authorization: Bearer $(gcloud auth print-access-token)" https://storage.googleapis.com/storage/v1/b/data-harmonization-sample-data?fields=name returns access forbidden whether I use my GCP credentials or my service account's

Is the data-harmonization-sample-data bucket restricted? How does one get access to this bucket?