apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.81k stars 4.23k forks source link

[Bug]: streaming is set to True in PipelineOptions but condition is evaluating as false for is_streaming_pipeline #27223

Open mayank311996 opened 1 year ago

mayank311996 commented 1 year ago

What happened?

On line https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L2030

if not is_streaming_pipeline and self.with_auto_sharding:
      raise ValueError(
          'with_auto_sharding is not applicable to batch pipelines.')

I have set streaming=True in PipelineOptions. I can see the same in dataflow UI. However, when I use with_auto_sharding=True, I get raise with ValueError. So somehow, is_streaming_pipeline is set to False even though I am setting streaming=True in pipelineOptions.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

AnandInguva commented 11 months ago

https://github.com/apache/beam/blob/fd1b87849c05bb0e4092f80a574cbb7ab3dc667c/sdks/python/apache_beam/io/gcp/bigquery.py#L2107

from here I don't see how auto_sharding effects the streaming option. How do you pass your pipeline options to the code?