apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.77k stars 4.21k forks source link

[Task]: Improve how to handle the Dataflow-specific option `impersonateServiceAccount` for Beam Java #30301

Open liferoad opened 7 months ago

liferoad commented 7 months ago

What needs to happen?

impersonateServiceAccount should be kept when submitting Dataflow jobs but should be removed when creating Dataflow workers per the design. To fix this, #30283 put a simple solution to remove the impersonateServiceAccount key from the JSON pipeline options. This introduces some Dataflow-specific concepts, which could be improved by moving it to the Dataflow-specific module. See more details in this comment.

Open this issue to track this potential task to improve how to handle Dataflow-specific options in the future.

Note for Beam Python, we remove this option from the internal Dataflow apiclient module

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

kennknowles commented 7 months ago

For this particular option, the dataflow service (the UW) should be the place where you remove the option.

The Python SDK is a real mess when it comes to isolating non-GCP and GCP things. It is not a good place to use as an example.