apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Task]: Document compatibility asipirations & test coverage policies for optional Beam dependencies #30908

Open tvalentyn opened 5 months ago

tvalentyn commented 5 months ago

What needs to happen?

Certain aspects of Beam functionality depend on actively evolving libraries, for example RunInference model handlers might require dependencies like PyTorch or Tensorflow.

We should document Beam policy on compatibility with with third-party libraries, which are not in Beam's dependency chain already.

Then, we should make sure our compatibility suites test at least against the lowest supported version, and the highest supported version: https://github.com/apache/beam/blob/master/.github/workflows/beam_PreCommit_Python_Coverage.yml

Testing the in-between versions can be done as needed. For example, we test against all supported versions of Pyarrow and Pandas, but those versions are also in our dependency chain already so we have requirements spelled-out. Dependencies like Tensorflow, on the other hand, are optional and not part of 'extras'.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

tvalentyn commented 4 months ago

I think we should introduce additional extras, that would define allowed ranges for optional beam dependencies.

Having many extras is very common in project that support a certain usecase on multiple backends.