GoogleCloudDataproc / spark-bigquery-connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Apache License 2.0
378 stars 198 forks source link

Impersonate Service Account #1192

Closed lcaggio closed 9 months ago

lcaggio commented 9 months ago

I am writing a pyspark script on GCP Dataproc to:

The Dataproc service account has AccessTokenCreator role on the service account to be impersonated (delegated_sa), the delegated_sa has access to GCS and BQ.

Script

... spark = SparkSession.builder \ .appName("Read CSV from GCS and Write to BigQuery") \ .config('spark.hadoop.fs.gs.auth.impersonation.service.account', delegated_sa) \ .config('gcpImpersonationServiceAccount', delegated_sa) \ .getOrCreate() ... data = spark.read.format("csv") \ .schema(schema) \ .load(csv) ... data.write.format('bigquery') \ .option('table', dataset_table) \ .mode('append') \ .save() ...

Error

        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
Caused by: com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
GET https://www.googleapis.com/bigquery/v2/projects/dataproc/datasets/tables/customers?prettyPrint=false
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "Access Denied: Tabledataproc:customers: Permission bigquery.tables.get denied on tabledataproc:customers (or it may not exist).",
    "reason" : "accessDenied"
  } ],
  "message" : "Access Denied: Tabledataproc:dataproc_out.customers: Permission bigquery.tables.get denied on tabledataproc:customers (or it may not exist).",
  "status" : "PERMISSION_DENIED"
}
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:428)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1111)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
        at com.google.cloud.spark.bigquery.repackaged.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
        at com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.getTable(HttpBigQueryRpc.java:284)
        ... 45 more

Note

lcaggio commented 9 months ago

Closing, I was using an old version of the bigquery connector ... Using the spark-bigquery-with-dependencies_2.12-0.36.1.jar it works with no issues.