apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.9k stars 4.27k forks source link

[Bug]: Add check for self-referencing input in YAML transform #32339

Open Polber opened 3 months ago

Polber commented 3 months ago

What happened?

The following pipeline will fail

pipeline:
  transforms:
    - type: Create
      name: Source
      config:
        elements:
          - id: 1
      input: Source
    - type: LogForTesting
      input: Source

with following error:

  ...
  File "/Users/jkinard/beam/sdks/python/apache_beam/yaml/yaml_transform.py", line 163, in strip_metadata
    if isinstance(spec, Mapping):
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jkinard/.pyenv/versions/3.11.6/lib/python3.11/typing.py", line 1305, in __instancecheck__
    return self.__subclasscheck__(type(obj))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jkinard/.pyenv/versions/3.11.6/lib/python3.11/typing.py", line 1583, in __subclasscheck__
    return issubclass(cls, self.__origin__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded in __subclasscheck__

Due to self-referencing transform - this should be a more clear error and caught earlier

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

mravi commented 1 week ago

.take_issue

mravi commented 1 week ago

.take-issue

mravi commented 1 week ago

@Polber ptal https://github.com/apache/beam/pull/33208