GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

BigQueryIO need underlying "bigquery.tables.get permission" access to query a view #557

Open wli600 opened 7 years ago

wli600 commented 7 years ago

I am running a BigQueryIO like

BigQueryIO.Read.usingStandardSql().fromQuery("SELECT * FROM foo-bar-123456.category_view.markets LIMIT 1000")

but seeing this error, "The user wli@comp.com does not have bigquery.tables.get permission for table foo-bar-123456.category.markets"

which category is the base table for view category_view. And with perm opened for user wli@comp.com to the underlying table, this query works.

Also, without opening up the perm, when I run on console with the same query against the view, it works too

Can you help to take a look what might be the issue or maybe an usage error?

Thanks

PS. for the SDK version, I am running with the latest commit as of March 1,

commit c4bff0bc70a3d0d69f062d7f1e670a2e2b3fd05d Merge: fc5fee2 4a9f164 Author: Daniel Halperin dhalperi@users.noreply.github.com Date: Wed Mar 1 20:12:01 2017 -0800

aaltay commented 7 years ago

@ThatRfernand could you please take a look?

ThatRfernand commented 7 years ago

Hi wli600,

I believe you need to give this user (wli@comp.com) the right IAM permissions to interact with BigQuery. There is more information in https://cloud.google.com/bigquery/docs/access-control.

You can set the permissions via IAM in Cloud Console: https://console.cloud.google.com/iam-admin/iam/iam-zero. There is more information about IAM in https://cloud.google.com/iam/ .

Hope that helps!

wli600 commented 7 years ago

Thanks @ThatRfernand for looking at it,

The issue here is that this user does have perm to access the view, as I can use the user's account to query the view from console.

But running with the BigQuery against the view shows error message saying the account is lack of perm to access the table itself, but not the view, which is strange.

criccomini commented 7 years ago

Further clarification: the view has been granted auth view access. So user has access to view, and view has access to underlying table. Query works fine via UI. Query fails via dataflow because it seems to be directly executing some metadata query directly against the underlying table.

dhalperi commented 7 years ago

We're investigating this internally -- the key issue right now is a mismatch between what the BigQuery UI does and what Dataflow does. The BigQuery UI can get information via internal side channels that Dataflow cannot (since we only call public BigQuery APIs).

Will keep you updated.

kennethmac2000 commented 6 years ago

Any news on this @dhalperi?

bigquery.table.get enables the reading of table/view metadata - what table/view metadata does Dataflow need to read?

criccomini commented 6 years ago

Yea, this is really annoying.

chamikaramj commented 6 years ago

Can you try using the withQueryLocation() property: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L967

Currently we need extra permissions to determine the location of the query but if you set the above, extra permissions should not be required.

ThatRfernand commented 6 years ago

Note that .withQueryLocation() was added to Apache Beam last April, so be sure to use Beam 2.5.0 or above.

xpat commented 5 years ago

Access Denied: Table bigquery-public-data:san_francisco_bikeshare.bikeshare_trips: The user xpat@pinchepoutine.com.mx does not have permission to query table bigquery-public-data:san_francisco_bikeshare.bikeshare_trips.

IAM permissions: pinchepoutine.com.mx
BigQuery Admin BigQuery Job User BigQuery User Billing Account Creator Owner Project Creator

xpat@pinchepoutine.com.mx
BigQuery Admin BigQuery Job User BigQuery User Organization Administrator

Oliveirakun commented 3 years ago

I had the same issue and solved adding these permissions to the service account used by the compute engine that runs the dataflow job