confluentinc / kafka-tutorials

Tutorials and Recipes for Apache Kafka
https://developer.confluent.io/tutorials
Apache License 2.0
6 stars 89 forks source link

Schema Registry multiple events per topic #624

Closed bbejeck closed 2 years ago

bbejeck commented 3 years ago

Schema Registry 5.5.0 supports having multiple event types per topic. This tutorial would cover what's needed to enable this functionality and when to use it. This feature builds off schema references, so #623 should be done first.

ybyzek commented 3 years ago

If we wanted to tie this in with streaming processing, then for ksqlDB:

ksqlDB does not have much support for topics with multiple schema types right now: https://github.com/confluentinc/ksql/issues/1267.

Potential workaround from @mikebin

That said, I think if you have a union/oneOf schema defined as in the blog post, ksqlDB will infer a “superset” schema containing all the types in the union. For example, with a schema like this:

{
  "fields": [
    {
      "name": "oneof_type",
      "type": [
        {
          "fields": [
            {
              "name": "fname",
              "type": "string"
            },
            {
              "name": "lname",
              "type": "string"
            }
          ],
          "name": "Customer",
          "type": "record"
        },
        {
          "fields": [
            {
              "name": "city",
              "type": "string"
            },
            {
              "name": "state",
              "type": "string"
            }
          ],
          "name": "Address",
          "type": "record"
        }
      ]
    }
  ],
  "name": "AllTypes",
  "namespace": "io.confluent.examples.avro",
  "type": "record"
}

ksqlDB would infer a schema that looks like this:

 Field      | Type
-----------------------------------------------------------------------------------------------------------------------------
 ONEOF_TYPE | STRUCT<CUSTOMER STRUCT<FNAME VARCHAR(STRING), LNAME VARCHAR(STRING)>, ADDRESS STRUCT<CITY VARCHAR(STRING), STATE VARCHAR(STRING)>>

So you could possibly write queries with CASE expressions to check which of the types are populated (i.e., non-null) for each record. I haven’t gone through an actual implementation of this, so perhaps someone else on the channel will have better insight and suggestions.

ybyzek commented 3 years ago

Related blog post: https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/

How to configure the consumer:

pbarsotti-glovo commented 2 years ago

Related blog post: https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/

How to configure the consumer:

I think all the links that you shared on how to configure the consumer are related to TopicRecordNameStrategy, as you can see here. To use the approach described in Yokota's post, TopicNameStrategy must be used, with the wrapper event schema defined in the topic (the one containing the UNION).

On the other hand, can you confirm if this alternative works for ksqlDB? I'm in an early stage deciding if I go with this approach or the TopicRecordNameStrategy and the only thing making me doubt about using the later is ksqlDB. This mechanism using references does not allow specific records usage so I have to do a lot of inferences and transformations to map a GenericRecord to one of the possible SpecificRecords contained inside. Schema evolution is also harder because you need to change the child schema and update the version on the parent. I don't want to invest a lot of time on it if ksqlDB does not work well with this...

ybyzek commented 2 years ago

@bbejeck can we close this GH issue, or is there anything else to add to the KT?

bbejeck commented 2 years ago

@ybyzek I think so. I'll close it and if something comes up we can always create a new issue.

bbejeck commented 2 years ago

Resolved via #983