confluentinc / schema-registry

Confluent Schema Registry for Kafka
https://docs.confluent.io/current/schema-registry/docs/index.html
Other
2.23k stars 1.12k forks source link

Documentation is missing important parts #1725

Open benkeil opened 3 years ago

benkeil commented 3 years ago

After 2 weeks it still not clear for me what are the best practices for using the schema registry as a developer.

How to use the @Schema and @SchemaReference annotations correct?

Is it the best to write events that your service produce as classes that gets then auto registered by the client and events that you consume to load via the maven plugin?

RobinGoussey commented 3 years ago

I didn't know that they existed. However since I haven't used them, I can't really answer the:

How to use the @Schema and @SchemaReference annotations correct?

However, I assume it has something to do with this: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-json.html#multiple-event-types-in-the-same-topic

I think the @Schema is used to hardcode a schema onto a java pojo, and the @SchemaReference is meant to annotate properties.

Is it the best to write events that your service produce as classes that gets then auto registered by the client and events that you consume to load via the maven plugin?

This statement is a bit confusing to me? Do you mean you just send your java pojo and let the schema auto register? And then let the consumers generate pojo's from the schemas?

But why would you let the pojo's be generated, if you already have them in the other application? (Separate the pojo's into another mvn package) I don't have the most experience, but you I would make sure you are consistent, or either the producer and consumer use the same pojo's, or both generate the pojo's from the schema.

I've personnally disabled auto schema registration, as in production I want a schema update to be deliberate and not the byproduct of some refactoring. My schema's are currently in a git repo, and I've made a small tool similar to kafka gitops (https://github.com/devshawn/kafka-gitops), but for schemas.

That way the schema's are in a git repo, and we control the format, that being said, I do send JsonNode's instead of pojo, because of our unique use case. But do you need the @Schema and @SchemaReference when using Json? Can't you just send the Pojo with auto registration?

All that being said, I can't find anything in documentation about the annotations.

benkeil commented 3 years ago

You use 2 kinds of objects in your application: events that get only created by your application and all others.

I played around and saw that it looks like the easiest way is to create a Java class for the events I want to produce and auto register them with the SchemaRegistryClient (because of the compatibility mode it sounds relative save for me). The schema gets registered the first time you send this kind of event.

But how should you use the schema registry in Java application for events that you consume. You need the classes in the code, so I guess it's not possible to use this method. So it makes sense to use for everything else the maven plugin, download the schemas you need and create Java classes from it.

And I still don't get how to use these annotations and what they are fore...

Does anyone else has also some best practices?

RobinGoussey commented 3 years ago

So the schema annotation is to hardcode which schema gets registered appearantly. Eg: https://github.com/confluentinc/schema-registry/issues/1728#issuecomment-755765581

You don't "need" the classes in the code, you can deserialize to json node too. Then you don't "need" the classes in code.

But if you control all applications that produce and consume, why not put all DTO's in a separate module, and import them wherever you need them. Bumping the schema then only Is upping the dependency.

On the other hand if you don't have/own the original java classes, then yes use the maven json/avro/... Plugin.