[SUPPORT] - Cannot Ingest Protobuf records using Hudi Streamer

I’m trying to ingest from a ProtoKafka source using Hudi Streamer but encountering an issue.

Exception in thread "main" org.apache.hudi.utilities.ingestion.HoodieIngestionException: Ingestion service was shut down with exception.
        at ...
Error reading source schema from registry. Please check hoodie.streamer.schemaprovider.registry.url is configured correctly. Truncated URL: https://....ons/latest
     at org.apache.hudi.utilities.schema.SchemaRegistryProvider.parseSchemaFromRegistry(SchemaRegistryProvider.java:111)
        at org.apache.hudi.utilities.schema.SchemaRegistryProvider.getSourceSchema(SchemaRegistryProvider.java:204)
        ... 10 more
...
Caused by: org.apache.hudi.internal.schema.HoodieSchemaException: Failed to parse schema from registry: syntax = "proto3";
package datagen;
...
Caused by: java.lang.NoSuchMethodException: org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter.<init>()
        at java.lang.Class.getConstructor0(Class.java:3082)
        at java.lang.Class.newInstance(Class.java:412)
        ... 13 more

The stack trace points to a misconfigured schema registry URL. However, the same URL works for Hudi streamer jobs ingesting from AvroKafka sources. When I ping the schema registry URL using curl, it correctly returns the schema.

Additional Context

I've verified the Protobuf schema is valid, it is a sample proto schema from Confluent’s Datagen connector.
I've confirmed the schema registry URL is configured correctly, it works fine with a similarAvroKafka spark job.
I added hoodie.streamer.schemaprovider.proto.class.name and hoodie.streamer.source.kafka.proto.value.deserializer.class=org.apache.kafka.common.serialization.ByteArrayDeserializer. I don't think these are required but their presence/absence did not resolve this error.

Environment Details Hudi version: v0.15.0 Spark version: 3.1.3 Scala version: 2.12 Google Dataproc version: 2.0.125-debian10

Spark Submit Command and Protobuf Configuration

gcloud dataproc jobs submit spark --cluster <GCP-CLUSTER> \  
  --region us-central1 \  
  --class org.apache.hudi.utilities.streamer.HoodieStreamer \  
  --project <GCP-PROJECT> \  
  --jars <jar-base-url>/jars/hudi-gcp-bundle-0.15.0.jar,<jar-base-url>/jars/spark-avro_2.12-3.1.1.jar,<jar-base-url>/jars/hudi-utilities-bundle-raw_2.12-0.15.0.jar,<jar-base-url>/jars/kafka-protobuf-provider-5.5.0.jar \  
 --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
 --source-class org.apache.hudi.utilities.sources.ProtoKafkaSource \
  --hoodie-conf sasl.jaas.config="org.apache.kafka.common.security.plain.PlainLoginModule required username='<username>' password='<password>';" \
  --hoodie-conf hoodie.streamer.schemaprovider.proto.class.name=<topic-name> \
  --hoodie-conf basic.auth.credentials.source=USER_INFO \
  --hoodie-conf schema.registry.basic.auth.user.info=<schema-registry-key>:<schema-registry-secret> \
  --hoodie-conf hoodie.streamer.schemaprovider.registry.url=https://<schema-registry-key>:<schema-registry-secret>@<schema-registry-url>/subjects/<topic-name>-value/versions/latest \
  --hoodie-conf hoodie.streamer.source.kafka.topic=<topic-name> \
  --hoodie-conf hoodie.streamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.protobuf.KafkaProtobufDeserializer \
  --hoodie-conf hoodie.streamer.schemaprovider.registry.schemaconverter=org.apache.hudi.utilities.schema.converter.ProtoSchemaToAvroSchemaConverter \

Steps to Reproduce

Build a Hudi 0.15.0 JAR with Spark 3.1 and Scala 2.12.
Use a Protobuf schema on an accessible schema registry, preferably an authenticated one.
Configure Hudi Streamer job with the spark submit command above.
Run the Spark job.

I’d appreciate any insights into resolving this issue. Is there an alternative or a workaround for configuring the Protobuf schema? Am I missing any configuration settings? Thank you for your help!

apache / hudi

[SUPPORT] - Cannot Ingest Protobuf records using Hudi Streamer #12301