Missing `woodstox-core` transitive dependency results in `ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper` in kafka connector distribution artifact #11489
After commit 7ac617a5a8b0dedbaaa6e19caedfd846968c7cac the dependency woodstox-core-6.7.0.jar is no longer included in the kafka-connect/kafka-connect-runtime/build/distributions/iceberg-kafka-connect-runtime-X.Y.Z-SNAPSHOT.zip and when deploying the connector to AWS MSK Connect it fails at startup with:
ERROR WorkerSinkTask{id=REDACTED-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:193)
java.lang.NoClassDefFoundError: com/ctc/wstx/io/InputBootstrapper
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:398)
at org.apache.iceberg.common.DynClasses$Builder.impl(DynClasses.java:68)
at org.apache.iceberg.connect.CatalogUtils.loadHadoopConfig(CatalogUtils.java:53)
at org.apache.iceberg.connect.CatalogUtils.loadCatalog(CatalogUtils.java:45)
at org.apache.iceberg.connect.IcebergSinkTask.open(IcebergSinkTask.java:56)
at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:641)
at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:71)
at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:706)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:449)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1257)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1226)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:458)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:191)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:240)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
... 28 more
This doesn't happen in the integrations tests (and most likely in other environments such as Confluent cloud either) because in the confluentinc/cp-kafka-connect docker image this dependency is already included, see:
In the Hive variant of the distribution artifact there's an older version of the dependency (woodstox-core-5.4.0.jar) but I don't think using this variant should be the solution, as it is meant for Iceberg installations using a Hive catalog.
Willingness to contribute
[ ] I can contribute a fix for this bug independently
[X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
[ ] I cannot contribute a fix for this bug at this time
Apache Iceberg version
main (development)
Query engine
Kafka Connect
Catalog
Glue
Please describe the bug 🐞
After commit 7ac617a5a8b0dedbaaa6e19caedfd846968c7cac the dependency
woodstox-core-6.7.0.jar
is no longer included in thekafka-connect/kafka-connect-runtime/build/distributions/iceberg-kafka-connect-runtime-X.Y.Z-SNAPSHOT.zip
and when deploying the connector to AWS MSK Connect it fails at startup with:This doesn't happen in the integrations tests (and most likely in other environments such as Confluent cloud either) because in the
confluentinc/cp-kafka-connect
docker image this dependency is already included, see:In the Hive variant of the distribution artifact there's an older version of the dependency (woodstox-core-5.4.0.jar) but I don't think using this variant should be the solution, as it is meant for Iceberg installations using a Hive catalog.
Willingness to contribute