GoogleCloudPlatform / data-science-on-gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Apache License 2.0
1.31k stars 715 forks source link

Chapter 2 - ingest_flights.py - Access denied to gs file on bqload function #170

Closed luisandrecunha closed 1 year ago

luisandrecunha commented 1 year ago

Hi, loving the book so far, I'm stuck with a newbie error I'm pretty sure, I followed all the steps up to "Securing Cloud Run" and I'm facing an issue when I try to load the csv.gz file into BQ (already using impersonating the service account) and just debugging the bqload step with hardcoded file path.

I'm getting the following error: "Try again later: 403 Access Denied: File gs://ds-on-gcp-401703-cf-staging/flights/raw/201603.csv.gz: Access Denied"

The file exists in the bucket. I checked with gstutil ls -l and has 22.31 MiB

I have checked all the permissions needed, and everything looks correct to me:

I wrote all the code by myself, and even ran the code from the repository. Still seeing the same error. Any tips?

I also followed this link and it seems that I should have everything that I need to upload data from GS to BQ: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery_load_table_partitioned-python

Any tips are appreciated! Thanks!

lakshmanok commented 1 year ago

In case it helps, the error is on accessing the gcs file, not on loading to BigQuery. So make sure that the service account has read permissions on the bucket/file. Storage admin doesn't give you access to individual files (because the data might belong to someone else).

thanks, Lak

On Tue, Oct 24, 2023, 8:56 PM Luís Cunha @.***> wrote:

Hi, loving the book so far, I'm stuck with a newbie error I'm pretty sure, I followed all the steps up to "Securing Cloud Run" and I'm facing an issue when I try to load the csv.gz file into BQ (already using impersonating the service account) and just debugging the bqload step with hardcoded file path.

I'm getting the following error: "Try again later: 403 Access Denied: File gs://ds-on-gcp-401703-cf-staging/flights/raw/201603.csv.gz: Access Denied"

The file exists in the bucket. I checked with gstutil ls -l and has 22.31 MiB

I have checked all the permissions needed, and everything looks correct to me:

  • role/storage.admin
  • role/biqueryDataOwner on the schema
  • project policy binding roles/bigquery.jobUser

I wrote all the code by myself, and even ran the code from the repository. Still seeing the same error. Any tips?

I also followed this link and it seems that I should have everything that I need to upload data from GS to BQ: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery_load_table_partitioned-python

Any tips are appreciated! Thanks!

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ36PRPAXOIAZVT527TYBCEYHAVCNFSM6AAAAAA6OV73COVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DANBYGI2DSNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

luisandrecunha commented 1 year ago

Thanks Lak, I might be missing something, the bucket has Uniform access control, and the Service account also has the Storage Object Admin role. Additionally, I can download, write and compress the file in the folder, so it's weird that only when BQ tries to read the file from GS I receive the error, it's the same user invoking all the steps!

Thanks, Luis

lakshmanok commented 1 year ago

Also Grant the service account read permissions on the bucket

thanks, Lak

On Wed, Oct 25, 2023, 8:54 AM Luís Cunha @.***> wrote:

Thanks Lak, I might be missing something, the bucket has Uniform access control, and the Service account also has the Storage Object Admin role. Additionally, I can download, write and compress the file in the folder, so it's weird that only when BQ tries to read the file from GS I receive the error, it's the same user invoking all the steps!

Thanks, Luis

— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/170#issuecomment-1779586091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZZWLS3YEPHEOMZP7RLYBEY4TAVCNFSM6AAAAAA6OV73COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZGU4DMMBZGE . You are receiving this because you commented.Message ID: @.***>

luisandrecunha commented 1 year ago

Lak, I recreated all the steps for data ingestion, bucket, table, service account and permission creation in a new project. All worked as expected.

Not sure what happened and I hate not understanding the root cause of the issue. I will continue to investigate.

Thanks for replying to my bug.