Open redfungus opened 1 year ago
It seems the main issue is with using the side input as num_shards
in WriteToText. Switching to WriteToFile fixed the issue.
Thanks for reporting. Did this pipeline worked before (the error happens when upgrading the beam version). If yes which version it worked?
For num_shards
you have to pass an int
, so you cannot pass a side input. This error really should be caught earlier by parameter validation.
What happened?
Using a side input in the form in WriteToText with Dataflow as the runner causes an error.
File ".venv\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 877, in run_ParDo step = self._add_step( File ".venv\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 652, in _add_step [ File ".venv\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 653, in <listcomp> item.get_dict() File ".venv\lib\site-packages\apache_beam\transforms\display.py", line 370, in get_dict self.is_valid() File ".venv\lib\site-packages\apache_beam\transforms\display.py", line 336, in is_valid raise ValueError( ValueError: Invalid DisplayDataItem. Value <apache_beam.pvalue.AsDict object at 0x000001AA22BED610> is of an unsupported type.
The pipeline works fine when running locally but fails when using a Dataflow runner. Tested with all the different
beam.pvalue.As...
too and it still happens.SDK version with the error: 2.46.0 Python version used: 3.9.13
Code of the whole pipeline:
Command used to run:
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components