Open yvesk opened 1 year ago
The Kafka Connect schema system does not natively support the "oneof" semantics that Protobuf does, and it appears that whatever has performed the translation (your value.converter) is doing the next best thing: treating each of the elements of the oneof as a field in the containing message, and leaving the unset fields as null.
Looking at the connector's code, it's using the JsonConverter to re-serialize the data, which will preserve these explicit nulls in the output JSON sent to OpenSearch. I don't see any logic which would filter out the unset fields, either from the schema or the value. I also don't think such logic is appropriate to add to the connector, as this is a protobuf-specific problem and may appear in another connector in a similar fashion.
@yvesk If you wish to remove these unset fields from your output, I think you will need a custom Transformation to drop the field(s) in the schema for each field which has a null value. This would mean that every message which gets to the connector will have a different schema, which only contains definitions for one field out of the multiple defined in the oneof. This SMT may be useful to add to https://github.com/aiven/transforms-for-apache-kafka-connect , so if you or anyone else would like to contribute this fix, we can discuss it more there.
Alternatively, the deserializing converter could be adjusted to produce varying schemas which only define the field which is set, essentially performing the operation of the SMT inside of the converter. What value.converter are you using for this example, and how is the translation from protobuf to connect schema being performed (karapace/schema registry/etc)?
Hi there, having a Protobuf schema like the following generates too verbose documents in OpenSearch
A message containing a values map with each type type once will produce the following document: