Open tvalentyn opened 10 months ago
cc: @AnandInguva
Avenues to address:
The rootcause is: starting from 2.50.0, we no longer stage Beam SDK
To clarify, what do you mean by this ?
I could run a Python x-lang pipeline on the RC by installing the SDK via the zip file and providing the same file to the "--sdk_location" flag.
https://dist.apache.org/repos/dist/dev/beam/2.50.0/python/apache-beam-2.50.0.zip
Starting from several releases back we also check that submission and runtime versions match.
Can we skip this condition check when the sdk is an RC?
The rootcause is: starting from 2.50.0, we no longer stage Beam SDK
To clarify, what do you mean by this ?
I could run a Python x-lang pipeline on the RC by installing the SDK via the zip file and providing the same file to the "--sdk_location" flag.
https://dist.apache.org/repos/dist/dev/beam/2.50.0/python/apache-beam-2.50.0.zip
Beam python downloads the sdk sdist or wheel from pypi and stages it to the staging environment. Then the boot.go looks for this staged file and installs it on Dataflow. We do this for Runner v1 since the default container for runner v1 doesn't contain Beam but from 2.50.0, runner v1 is deprecated, so we stopped staging SDK since Runner v2 containers have Beam installed in them and there is no need to stage Beam SDK.
If a tarball is passed to the --sdk_location
, we stage it and install it on the Dataflow but by default we don't stage anything from 2.50.0
I'm basically always +1 on reverting in situations like this; I think it is almost always the fastest and safest thing to do, especially when the feature is helping us but not providing customer value. Reverting a revert is easy, and then we can pin an extra commit on that PR to get the fix forward in the next release.
I put up https://github.com/apache/beam/pull/28094 to do this
strip RC suffixes during validation
I'm +1 on this as the long term fix. Its simple and still allows us to exercise all the functionality we need to do validation.
Discussed offline, we're going to try rolling forward and only revert if we run into issues. @tvalentyn is going to take this forward
fix merged to master and CP created for release branch. Repurposing the issue for potential follow up work:
The right thing to do would be to install in RC Docker containers the same package content that is being published as RC.
CP Merged in. I'm going to start the RC2 process soon. Please validate fix once RC2 is available.
What happened?
Pipeline I ran failed with an error:
_Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.50.0rc1. Runtime environment: beam:version:sdk_base:apache/beam_python3.11sdk:2.50.0. Worker ID: beamapp-valentyn-08220117-08211817-m76c-harness-v38w
The rootcause is: starting from 2.50.0, we no longer stage Beam SDK. Starting from several releases back we also check that submission and runtime versions match. However Python Docker containers we build for RCs don't install the SDK RC version of Beam SDK tarball.
This issue blocks further validation of RC1 for Python Dataflow pipelines.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components