apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.67k stars 4.19k forks source link

[Failing Test]: Typescript Dataflow tests are failing. #30199

Open tvalentyn opened 5 months ago

tvalentyn commented 5 months ago

What happened?

Sample failure: https://github.com/apache/beam/actions/runs/7759817877/job/21164749924

Reason: test is using Beam Python Dev SDK, attempts to run on Dataflow, but doesn't supply --sdk_location . This is error is raised by Dataflow runner to prevent a scenario where an sdk at job submission doesn't match the SDK at runtime.

Fix: Use a released Python SDK installation when running a Typescript test pipeline, or supply an additional flag: --sdk_location=./path/to/sdist or supply --sdk_location=container if using slightly outdated default dev SDK container is acceptable, then we don't need to build an sdist.

ERROR:apache_beam.runners.portability.local_job_service:Error running pipeline.
Traceback (most recent call last):
  File "/home/runner/work/beam/beam/sdks/python/apache_beam/runners/portability/local_job_service.py", line 297, in _run_job
    self.result = self._invoke_runner()
  File "/home/runner/work/beam/beam/sdks/python/apache_beam/runners/dataflow/dataflow_job_service.py", line 36, in _invoke_runner
    self.result = runner.run_pipeline(
  File "/home/runner/work/beam/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 359, in run_pipeline
    _check_and_add_missing_options(options)
  File "/home/runner/work/beam/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 589, in _check_and_add_missing_options
    raise ValueError(
ValueError: You are submitting a pipeline with Apache Beam Python SDK 2.55.0.dev. When launching Dataflow jobs with an unreleased (dev) SDK, please provide an SDK distribution in the --sdk_location option to use a consistent SDK version at pipeline submission and runtime. To ignore this error and use an SDK preinstalled in the default Dataflow dev runtime environment or in a custom container image, use --sdk_location=container.

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 2 (backlog / disabled test but we think the product is healthy)

Issue Components

tvalentyn commented 5 months ago

When running the test locally, it doesn't seem to run:

$ npm test -- --grep "@dataflow" 

produces no logs for me:

> apache-beam@2.55.0-SNAPSHOT pretest
> npm run build

> apache-beam@2.55.0-SNAPSHOT build
> bash build.sh

> apache-beam@2.55.0-SNAPSHOT test
> mocha dist/test dist/test/docs --grep @dataflow

  0 passing (2ms)

cc: @robertwb

tvalentyn commented 5 months ago

Tried to timebox the fix but unfortunately gave up while setting up local test environment, it might be an easy fix for someone who has typescript environment set up. We might want to update the docs what is necessary to run these tests locally.