confluentinc / kafka-connect-jdbc

Kafka Connect connector for JDBC-compatible databases
Other
22 stars 958 forks source link

[QUESTION] Use Json Schema with Kafka connect #819

Open klalafaryan opened 4 years ago

klalafaryan commented 4 years ago

Hello,

Context: We are trying to build following architecture with Kafka Connect and Kafka Streams

MYSQL -> KAFKA-CONNECT-JDBC (SOURCE connector) -> KAFKA -> KAFKA-STREAMS (doing some normalizations) -> KAFKA -> KAFKA-CONNECT-JDBC (SINK connector) -> POSTGRES

So I have following questions:

  1. Is it possible to use json-schema (https://json-schema.org/) with kafka-connect-jdbc instead of kafka connect schema ?

  2. To be able to sink the data into POSTGRES the kafka connect jdbc sink requires the schema, and for this we have to produce the schema with payload from kafka-streams.

So we create a JAVA POJO (or JsonNode) and a schema separately in the kafka streams.

Thanks a lot for your input.

gharris1727 commented 4 years ago

Is it possible to use json-schema (https://json-schema.org/) with kafka-connect-jdbc instead of kafka connect schema ?

The JDBC connector always uses connect schema objects, so if you want to store data in Kafka with json-schema, then you will need to write your own Converter implementation that serializes and deserializes this format.

Is it possible to create the schema dynamically from POJO ?

Yes this is possible, and we generally call this technique schema inferencing. Doing it in general is a little bit involved, and it's up to you to decide if investing in dynamic schema generation vs updating the schema yourself is more cost effective.

I found a related guide that statically builds the schema for the streams application: https://kafka-tutorials.confluent.io/changing-serialization-format/kstreams.html

Is it possible to validate the POJO with the schema before producing ? I'd assume that any reasonable schema library would give you the tools to validate objects, but it's technically up to implementation. Your Converter/Serde implementation would need to implement that behavior, since Kafka/Connect won't be able to recognize a data/schema mismatch.

Overall, I think that unless you have existing infrastructure using json-schema, and are willing to implement and maintain custom Converters/Serdes, you're better off choosing an off-the-shelf serialization format that's already supported, such as Json with schemas (JsonConverter), or Avro (AvroConverter).

OneCricketeer commented 3 years ago

JSONSchema converters are now included in Schema Registry + Confluent Platform