GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Using Service Account with dataflow #554

Open seehans opened 7 years ago

seehans commented 7 years ago

I am trying to use a dataflow job with a service account. I am using the GcpOptions flags --serviceAccountKeyfile="dataflow-service-account.p12" --serviceAccountName="dataflow"

I am getting following error: Unable to verify that GCS bucket exists.com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:84) com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:63) com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:274)

I would like to use service acoount credentials against application default credentials.

seehans commented 7 years ago

It works if i give following value to serviceAccountName flag: --serviceAccountName="dataflow@my-project.gserviceaccount.com"

Documentation is misleading, we are actually giving "Service account ID" value to serviceAccountName flag and not "Service account name" .

davorbonaci commented 7 years ago

I'll reopen to clarify the documentation.

Thanks @seehans! Also, if you are interested, feel free to contribute a documentation fix to the Apache Beam codebase: https://github.com/apache/beam.