apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.77k stars 4.21k forks source link

Intuitive default behavior for sdk_location pipeline option #19168

Open kennknowles opened 2 years ago

kennknowles commented 2 years ago

The current default value of "default" implies a Dataflow specific behavior of the artifact stager. The same stager is also used by the portable runner, which has to specify a value "container", which actually means to not stage the SDK. That should be the default behavior and the default value for the sdk_location should be None. The Dataflow runner can then specify a value such as "pypi" which conveys more closely the expected behavior.

Imported from Jira BEAM-5525. Original Jira may contain additional context. Reported by: thw.

robertwb commented 1 year ago

Related: https://github.com/apache/beam/issues/26996