confluentinc / schema-registry

Confluent Schema Registry for Kafka
https://docs.confluent.io/current/schema-registry/docs/index.html
Other
2.22k stars 1.11k forks source link

Better management and documentation of protobuf field numbers in Kafka Connect converter / source #2551

Open james-johnston-thumbtack opened 1 year ago

james-johnston-thumbtack commented 1 year ago

Protobuf is unique among Schema Registry formats in that the fields are also numbered in addition to named. And the choice of number is extremely important from a protobuf compatibility standpoint.

However, how the Kafka Connect sources manage these protobuf field numbers is not at all clear to me. The documentation doesn't say much about the matter at all:

In reality, it looks like a Kafka Connect source would pick field numbers in an auto-incrementing fashion: https://github.com/confluentinc/schema-registry/blob/a12d763bf6813791065a3b0036f4a8eec28f71ed/protobuf-converter/src/main/java/io/confluent/connect/protobuf/ProtobufData.java#L729

This seems like it has a lot of consequences from a compatibility standpoint:

How I would imagine this would work in source converters:

But essentially, to start with, what I think is most missing is documentation around what the protobuf converter does when dynamically creating new schemas in a Kafka Connect source, especially around field numbers.

rayokota commented 1 year ago

@james-johnston-thumbtack , until we improve the Protobuf converter, possibly as you suggested, for now you can use an SMT to manually add the tags to the Connect schema. For each field in the Connect schema, you would add a parameter with key "io.confluent.connect.protobuf.Tag" and the value being the desired tag number.

dmariassy commented 7 months ago

Hi @rayokota , any progress on providing better support for the idiomatic use of protobuf field ids?