apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.8k stars 4.22k forks source link

[Task]: Support custom Python SDK for sdks/java/extensions/python PythonService #31680

Closed Abacn closed 3 months ago

Abacn commented 3 months ago

What needs to happen?

Currently, PythonService determines the Python SDK version as in

https://github.com/apache/beam/blob/f64aec237c2115fc98170e526e093505bc8b3d06/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonService.java#L87

and

https://github.com/apache/beam/blob/f64aec237c2115fc98170e526e093505bc8b3d06/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonService.java#L140

based on the java SDK info.

This has a few problems.

  1. For release candidates, Java artifacts does not use "rc" suffices while Python SDK does. This will cause PythonService trying to install the not-yet-released pypi package, failing the initialization.

  2. For dev version, it always pin to "latest" version in pypi. One may not be able to run an old dev version.

We should expose the interface to support install custom Python SDK, either assign a version number (like 2.57.0RC1) or provide a path of the tar ball.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components