GoogleCloudPlatform / DataflowTemplates

Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates
Apache License 2.0
1.11k stars 931 forks source link

Validate Beam 2.57.0RC1 #1681

Closed Abacn closed 2 weeks ago

Abacn commented 2 weeks ago

Build failed with

Error:  Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.4.1:enforce (enforce) on project dynamic-templates: Execution enforce of goal org.apache.maven.plugins:maven-enforcer-plugin:3.4.1:enforce failed. NullPointerException -> [Help 1]

org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.4.1:enforce (enforce) on project dynamic-templates: Execution enforce of goal org.apache.maven.plugins:maven-enforcer-plugin:3.4.1:enforce failed.

...
[ERROR] Rule 0: org.codehaus.mojo.extraenforcer.dependencies.EnforceBytecodeVersion failed with message:

Caused by: java.lang.NullPointerException
    at org.apache.maven.plugins.enforcer.EnforceBytecodeVersion.isBadArtifact (EnforceBytecodeVersion.java:352)

It was not clear which dependency caused NPE in "isBadArtifact" check. Upgrading to 3.5.0 and to 1.8.0 the error message is clearer:

The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:7.6.0, io.confluent:common-utils:jar:7.6.0, io.confluent:kafka-schema-registry-client:jar:7.6.0, io.confluent:kafka-schema-serializer:jar:7.6.0: Could not find artifact io.confluent:kafka-avro-serializer:jar:7.6.0 in splunk-artifactory (https://splunk.jfrog.io/splunk/ext-releases-local)

what happens is that maven-enforce-plugin not able to find io.confluent dependencies, likely due to https://github.com/apache/beam/blob/0f2e1963987f1fbb3329016d8c862639ed4fbe43/website/www/site/content/en/blog/beam-2.56.0.md?plain=1#L38 however this was a Beam 2.56.0 change.

Abacn commented 2 weeks ago

Did two trial. failing tests are

Spanner PR passed

Abacn commented 2 weeks ago

BigtableChangeStreamsToPubSubIT.testDeadLetterQueueDelivery failed publish message:

com.fasterxml.jackson.databind.JsonMappingException: String length (20054016) exceeds the maximum length (20000000) (through reference chain: com.google.cloud.teleport.v2.templates.bigtablechangestreamstopubsub.model.Mod["changeJson"])
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
    ...
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3740)
    at com.google.cloud.teleport.v2.templates.bigtablechangestreamstopubsub.model.Mod.fromJson(Mod.java:148)
    at com.google.cloud.teleport.v2.templates.bigtablechangestreamstopubsub.FailsafePublisher$PublishModJsonToTopic$PublishModJsonToTopicFn.newPubsubMessage(FailsafePublisher.java:132)
    at com.google.cloud.teleport.v2.templates.bigtablechangestreamstopubsub.FailsafePublisher$PublishModJsonToTopic$PublishModJsonToTopicFn.processElement(FailsafePublisher.java:119)
    at com.google.cloud.teleport.v2.templates.bigtablechangestreamstopubsub.FailsafePublisher$PublishModJsonToTopic$PublishModJsonToTopicFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForParDo(FnApiDoFnRunner.java:803)
    ...
Caused by: com.fasterxml.jackson.core.exc.StreamConstraintsException: String length (20054016) exceeds the maximum length (20000000)
    at com.fasterxml.jackson.core.StreamReadConstraints.validateStringLength(StreamReadConstraints.java:324)
    at com.fasterxml.jackson.core.util.ReadConstrainedTextBuffer.validateStringLength(ReadConstrainedTextBuffer.java:27)
    at com.fasterxml.jackson.core.util.TextBuffer.finishCurrentSegment(TextBuffer.java:939)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2240)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2206)
    at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:323)
    at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:42)
    at com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:11)
    at com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:314)
    ... 68 more

need to do #31580 for templates here also

Abacn commented 2 weeks ago

validation passed except for

org.apache.beam.sdk.extensions.python.PythonService.start(PythonService.java:109)

...

ERROR: No matching distribution found for apache_beam==2.57.0

and checking https://github.com/apache/beam/blob/f64aec237c2115fc98170e526e093505bc8b3d06/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/PythonService.java#L87 there isn't interface exposed to pin a python sdk version (to RC)