GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.14k stars 950 forks source link

[Do not merge] Beam 2.58.0rc1 validation #1738

Closed Abacn closed 1 month ago

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 42.32%. Comparing base (d7db191) to head (6d883be).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1738 +/- ## ============================================ - Coverage 42.32% 42.32% -0.01% Complexity 3182 3182 ============================================ Files 794 794 Lines 46244 46246 +2 Branches 4951 4953 +2 ============================================ Hits 19572 19572 - Misses 25074 25076 +2 Partials 1598 1598 ``` | [Components](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=components&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | Coverage Δ | | |---|---|---| | [spanner-templates](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `63.70% <ø> (ø)` | | | [spanner-import-export](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `64.44% <ø> (ø)` | | | [spanner-live-forward-migration](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `74.97% <ø> (ø)` | | | [spanner-live-reverse-replication](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `51.84% <ø> (ø)` | | | [spanner-bulk-migration](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | `82.79% <ø> (ø)` | | | [Files](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform) | Coverage Δ | | |---|---|---| | [...cloud/teleport/v2/templates/SpannerToBigQuery.java](https://app.codecov.io/gh/GoogleCloudPlatform/DataflowTemplates/pull/1738?src=pr&el=tree&filepath=v2%2Fgooglecloud-to-googlecloud%2Fsrc%2Fmain%2Fjava%2Fcom%2Fgoogle%2Fcloud%2Fteleport%2Fv2%2Ftemplates%2FSpannerToBigQuery.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=GoogleCloudPlatform#diff-djIvZ29vZ2xlY2xvdWQtdG8tZ29vZ2xlY2xvdWQvc3JjL21haW4vamF2YS9jb20vZ29vZ2xlL2Nsb3VkL3RlbGVwb3J0L3YyL3RlbXBsYXRlcy9TcGFubmVyVG9CaWdRdWVyeS5qYXZh) | `0.00% <0.00%> (ø)` | |
Abacn commented 1 month ago

Other than Yaml/PythonUDF templates, the following tests failing

SpannerToBigQueryIT.testSpannerToBigQuery:

SpannerToBigQueryIT.testSpannerToBigQueryNoSchemaFile

Error message:

java.lang.IllegalArgumentException: Both query and table cannot be specified at the same time for SpannerIO.read().
2024-07-19 12:25:55.130 EDT
    at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$Read.expand(SpannerIO.java:890)
2024-07-19 12:25:55.130 EDT
    at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$Read.expand(SpannerIO.java:708)

due to https://github.com/apache/beam/pull/31570 . This is a breaking change. Currently for SpannerToBigQueryOptions both getSpannerTableId and getSqlQuery are not optional. Arguably this would break all users using the same template parameters for new beam version.

Abacn commented 1 month ago

SpannerIO config fixed by updating templates. Besides, YamlTemplate failed launch, error message

AttributeError: 'str' object has no attribute 'items'

 Traceback (most recent call last):
 File "/template/main.py", line 69, in <module>
 run()
 File "/template/main.py", line 63, in run
 main.run(argv=_preparse_jinja_flags(argv))
 File "/usr/local/lib/python3.11/site-packages/apache_beam/yaml/main.py", line 143, in run
 yaml_transform.expand_pipeline(
 File "/usr/local/lib/python3.11/site-packages/apache_beam/yaml/yaml_transform.py", line 1062, in expand_pipeline
 validate_against_schema(pipeline_spec, validate_schema)
 File "/usr/local/lib/python3.11/site-packages/apache_beam/yaml/yaml_transform.py", line 87, in validate_against_schema
 jsonschema.validate(pipeline, pipeline_schema(strictness))
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 1330, in validate
 error = exceptions.best_match(validator.iter_errors(instance))
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/lib/python3.11/site-packages/jsonschema/exceptions.py", line 475, in best_match
 best = next(errors, None)
 ^^^^^^^^^^^^^^^^^^
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 384, in iter_errors
 for error in errors:
 File "/usr/local/lib/python3.11/site-packages/jsonschema/_keywords.py", line 296, in properties
 yield from validator.descend(
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 432, in descend
 for error in errors:
 File "/usr/local/lib/python3.11/site-packages/jsonschema/_legacy_keywords.py", line 135, in items_draft6_draft7_draft201909
 yield from validator.descend(item, items, path=index)
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 432, in descend
 for error in errors:
 File "/usr/local/lib/python3.11/site-packages/jsonschema/_keywords.py", line 275, in ref
 yield from validator._validate_reference(ref=ref, instance=instance)
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 432, in descend
 for error in errors:
 File "/usr/local/lib/python3.11/site-packages/jsonschema/_keywords.py", line 383, in if_
 if validator.evolve(schema=if_schema).is_valid(instance):
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 500, in is_valid
 error = next(self.iter_errors(instance), None)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/usr/local/lib/python3.11/site-packages/jsonschema/validators.py", line 384, in iter_errors
 for error in errors:
 File "/usr/local/lib/python3.11/site-packages/jsonschema/_keywords.py", line 294, in properties
 for property, subschema in properties.items():
 ^^^^^^^^^^^^^^^^