apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.75k stars 4.21k forks source link

[Failing Test]: Some Python integration tests runs result in environment mismatch. #28653

Closed tvalentyn closed 9 months ago

tvalentyn commented 11 months ago

What happened?

See failing runs on:

https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/11337/

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

tvalentyn commented 11 months ago

Tentatively adding as a blocker until confirmed it's not affecting the release branch

tvalentyn commented 11 months ago

09:38:53 RuntimeError: Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.52.0.dev. Runtime environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.51.0.dev.

tvalentyn commented 11 months ago

likely this will not affect the release branch, but something is misconfigured.

tvalentyn commented 11 months ago

Seeing this in one job:

ERROR 2023-09-21T20:30:25.653406673Z Processing /var/opt/google/staged/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
ERROR 2023-09-21T20:30:25.653431342Z ERROR: Wheel 'apache-beam' located at /var/opt/google/staged/apache_beam-2.52.0.dev0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl is invalid.
DEBUG 2023-09-21T20:30:25.653442932Z Could not install Apache Beam SDK from a wheel: exit status 1, proceeding to install SDK from source tarball.

...

tvalentyn commented 11 months ago

I believe we currently don't stage tarballs in tests, and somehow the provided wheel is either not compatible or got corrupted during retrieval: https://github.com/apache/beam/issues/28605

tvalentyn commented 11 months ago

I think this is caused by https://github.com/apache/beam/issues/28605 .

kennknowles commented 11 months ago

I am not totally following whether this could impact the release. Would we expect to be seeing red tests on the release branch? We did manage to get green Python tests today.

tvalentyn commented 11 months ago

I don't attribute this issue to a regression in 2.51.0, but there may be flakiness in streaming test pipelines until this issue fixed or Dataflow runner rolls out a release (tentative ETA end of this week).

Longer story: Python integration tests are supposed to pass --sdk_location. Due to a race during installation, some workers fail to install the SDK and become incorrectly intialized. This would not happen to workers using so called sibling sdk container protocol. Users on released Beam sdk don't stage SDK at job submission so wouldn't see this particular failure mode.

I will remove this issue from 2.51.0 blocker lists for now.

tvalentyn commented 9 months ago

This would not happen to workers using so called sibling sdk container protocol.

All Dataflow python pipelines use sibling protocol now.