Closed gabistoenescu closed 1 year ago
Hi @gabistoenescu thanks for raising this ticket.
Thanks for your precise and detailed issue description, much appreciated.
We can probably ship a tactical solution to our Community edition fairly quickly with a longer term strategic solution following in the course of our next few public releases.
This issue is due to dependency conflicts between Confluent and AWS Glue serdes libraries that we have known of for some time but have not resolved in our public builds due to AWS Glue+Proto support being relatively recent (2022) and little demand from our users.
Resolving this issue strategically requires us to publish a aws-glue-compatibility
(or similarly named) version of each Kpow JAR/container across each of our product lines (Community, Standard, Enterprise, + each AWS Marketplace artefacts); to document why we do that; to maintain ongoing details of when a specific release requires a aws-glue-compatibility
release; and finally to document the trade-offs of using the compatibility mode version.
We had been hoping the universe would align and this issue would be resolved by up-stream teams in their libraries but it appears that it will be an ongoing, fluctuating issue that will ebb and flow as those libraries advance.
Your ticket requires a general, public resolution so we will now commit to that work.
Kpow is an enterprise-grade solution that meets our users needs in a broad market by supporting a varied number of providers and their offerings in a single artefact. This includes support for:
Kpow is built from a variety of dependencies to meet those requirements. We invest considerable time in understanding our dependencies intricately, and we take a determined approach to dependency management where we:
Kpow has supported AWS Glue since 2021. When Glue introduced protobuf support in 2022 initially all was well, but we realised in late 2022 that an advance in Confluent protobuf library version had introduced a conflict on a shared transitive dependency that broke protobuf in Glue in exactly the manner you have identified.
This PR is the crux of the issue: https://github.com/awslabs/aws-glue-schema-registry/pull/230
That ticket was resolved last month, and while the AWS team have updated the wire-schema dependency, Confluent have since moved on to a later version and exactly the same problem persists.
As of:
Confluent 7.4.1 requires wire 4.4.3
[io.confluent/kafka-protobuf-serializer "7.4.1" :exclusions [[org.yaml/snakeyaml]]]
[io.confluent/kafka-protobuf-provider "7.4.1"]
[com.squareup.okio/okio-jvm "3.0.0"]
[com.squareup.wire/wire-runtime-jvm "4.4.3" :exclusions [[org.jetbrains.kotlin/kotlin-stdlib]]]
[com.squareup.wire/wire-schema-jvm "4.4.3" :exclusions [[org.jetbrains.kotlin/kotlin-stdlib]]]
AWS Glue 1.1.16 requires wire 4.3.0
[com.squareup.wire/wire-compiler "4.3.0" :exclusions [[com.squareup.wire/wire-grpc-client] [com.charleskorn.kaml/kaml]]]
[com.squareup.wire/wire-java-generator "4.3.0" :scope "runtime"]
[com.squareup.wire/wire-kotlin-generator "4.3.0" :scope "runtime"]
[com.squareup.wire/wire-grpc-client-jvm "4.3.0" :scope "runtime"]
[com.squareup.okhttp3/okhttp "4.9.3" :scope "runtime"]
[org.jetbrains.kotlinx/kotlinx-coroutines-core-jvm "1.5.2" :scope "runtime"]
[com.squareup.wire/wire-grpc-server-generator "4.3.0" :scope "runtime"]
[com.squareup.wire/wire-profiles "4.3.0" :scope "runtime"]
[com.squareup.wire/wire-swift-generator "4.3.0" :scope "runtime"]
[io.outfoxx/swiftpoet "1.3.1" :scope "runtime"]
[com.squareup.wire/wire-schema "4.3.0"]
[com.squareup.wire/wire-runtime "4.3.0" :scope "runtime"]
These two are not compatible, in the case of square being excluded from Glue we get:
(dev/send-glue-proto "glue_proto" "5")
Execution error (NoSuchMethodError) at com.amazonaws.services.schemaregistry.utils.apicurio.FileDescriptorUtils/toMessage (FileDescriptorUtils.java:881).
'void com.squareup.wire.schema.internal.parser.MessageElement.<init>(com.squareup.wire.schema.Location, java.lang.String, java.lang.String, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List)'
This is the current normal case where we cannot read/write Glue Protobuf.
If we switch the deps around Glue can now produce protobuf, but Confluent cannot:
(dev/send-proto "proto_tx" "7")
Execution error (NoSuchMethodError) at io.confluent.kafka.schemaregistry.protobuf.ProtobufSchema/toMessage (ProtobufSchema.java:994).
'void com.squareup.wire.schema.internal.parser.MessageElement.<init>(com.squareup.wire.schema.Location, java.lang.String, java.lang.String, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List, java.util.List)'
This leaves Kpow with these capabilities currently, which you have identified.
Kpow Version | Consume AVRO | Consume JSON Schema | Consume Proto |
---|---|---|---|
91.6 Confluent | ✅ | ✅ | ✅ |
91.6 Glue | ✅ | ✅ | ❌ |
Kpow Version | Produce AVRO | Produce JSON Schema | Produce Proto |
---|---|---|---|
91.6 Confluent | ✅ | ✅ | ✅ |
91.6 Glue | ✅ | ✅ | ❌ |
Kpow Version | Edit AVRO Schema | Edit JSON Schema | Edit Proto Schema |
---|---|---|---|
91.6 Confluent | ✅ | ✅ | ✅ |
91.6 Glue | ✅ | ✅ | ✅ |
An intermediary solution is to select Confluent and Glue libraries based on the latest-highest version of each where they have a common shared wire-schema
dependency and build an aws-glue-baseline
release from those dependencies.
This solution is sub-optimal due to AWS Glue's glacial cadence and low-quality dependency management. We are required to effectively pin Confluent, Kafka, and other shared dependencies to old versions for the initial compatibility mode in a manner that does not sit well with our internal discipline for dependency hygiene.
This is the action we will investigate now, initially just for Community Edition to resolve this ticket without building out the build pipelines necessary to support it across our entire range of deliverables.
The resolution of the AWS Glue ticket did not fix the transitive dependency issue due to other libraries moving on.
Moving forward in each release we will,
aws-glue-compatiblity
release containing AWS Glue serdes only.Meaning our users with Glue+Protobuf requirements can choose to use either (2) or remain on an older release (3) depending on their requirements for mixed-schema or not.
Thanks again, Derek
Hey @gabistoenescu just an update that we will publish the tactical fix to factorhouse/kpow-ce:91.5-aws-glue-baseline
early next week. I'll ack/close this ticket when it's available.
Hi @d-t-w Thank you very much for you prompt response and very valuable explanations. They are very much appreciated.
Hi @gabistoenescu, please update your container reference to the following:
factorhouse/kpow-ce:91.5.1-aws-glue-baseline
That is the new baseline
image that supports both Glue and Confluent serdes.
I will close this ticket now, please just reach out if you need any further support. If you would like a POC license to evaluate authz/multi-cluster/etc just let me know.
Derek
Hi @d-t-w Thank you very much for the fix. I tested it out and confirmed that the Data Inspect functionality worked as expected.
topic: target-topic
partition: 5
offset: 0
timestamp: 1691032173641
age: 4d 17h 33m 01s
headers: {}
value: {}
Version of Kpow Latest (as of August 3rd 2023)
Describe the bug When trying to inspect messages using a Protobuf Value Deserializer
kpow
throws ajava.lang.NoSuchMethodError
.The application starts without exceptions using the following command (in a MacOS environment):
docker run --pull=always -p 3000:3000 --env-file ~/kpow-config.env -m 2G -v ~/.aws:/root/.aws factorhouse/kpow-ce:latest
The given env configuration allows for kpow to succesfully connect to an MSK cluster and an AWS Glue registry. The configuration file looks like:
The UI seems to function correctly including the data inspection features (without Protobuf deserialization) The connectivity to AWS Glue also seems correct and using the "Schema" feature displays the target Subject and the "Edit Subject" action retrieves the schema specification. Currently the specification is empty and looks like this:
The source topic has a couple of messages, all successfully encoded (by a running application) using the Protobuf schema. Using the Data Inspection with the String serializer renders messages with funny values (as expected)
Using the Data Inspection with the correspondent Protobof serializer results in errors:
The application logs the following errors:
Some help would be appreciated, as the stack trace appears to indicate that an incompatible dependency is loaded at runtime when the Protobuf deserialization is attempted.