Closed Oduig closed 3 years ago
I'd commented on your Stackoverflow post, so would the solution of not creating case classes work for you? e.g. download and generate instead?
More specifically, since this repo has no examples outside of Java code, why limit examples to that? What about other JVM languages?
What's preventing your Scala project from using generated Java classes (can be done using Maven if not using sbt plugin above)?
I feel like the issue that should be addressed is the Avro4s model definition not matching the schema you've stored (can you give an example, and have you opened an issue with the Avro4s project about this?)
I'd commented on your Stackoverflow post, so would the solution of not creating case classes work for you? e.g. download and generate instead?
More specifically, since this repo has no examples outside of Java code, why limit examples to that? What about other JVM languages? What's preventing your Scala project from using generated Java classes (can be done using Maven if not using sbt plugin above)?
I feel like the issue that should be addressed is the Avro4s model definition not matching the schema you've stored (can you give an example, and have you opened an issue with the Avro4s project about this?)
Thank you for your time, appreciate it! You are correct in that generating these classes would work, though I feel like it is a complicated solution for a relatively simple problem. There are 3 small reasons why I think having a serializer for regular classes is preferable.
build.sbt
depending on an environment variable, and have every developer who wants to generate classes set an extra environment variable for the build process.To summarize, we can use SBT code generation but it feels like a bit of a workaround compared to the following straightforward solution:
I think this would make things a lot simpler, easier to implement on both sides right? Is there a reason why we would not do it this way?
You seem to be missing the fact that only IndexedRecord subclasses can be serialized to Avro, not standalone case classes / POJOs.
Secondly, if you define the schema first, anyway, then it's possible your manual class definition will diverge, and therefore generation should be preferred instead of depending on runtime errors after you've already built the project, published its artifacts, and deployed them. In some environments, that feedback loop can be several days long.
You can store env vars in CI/CD systems and refer to them in sbt during build; those get pushed as versioned code dependencies as an extension of the versioned schemas in the registry. In maven, we use profiles to opt in to these features, not only env vars.
Generated classes shouldn't be output in your src/main/scala
, for clear separation, and would include javadoc that states its generated and not meant to be modified. Any file in the generated folder should be ignored from VCS, so that in CI/CD before compilation, classes are generated and therefore cannot be modified, anyway.
Finally, without ReflectData, no schema is automatic, but as you've stated, you'd like to follow schema-first design
That sounds like a good setup, and clearly the recommended one at this time so I'll close this. Still, the mechanism you describe is a lot more complicated to set up. For future reference, perhaps the proposed setup is worth taking into consideration.
Serialization to JSON or XML is possible out of the box with a POJO or case class, and Avro has a clearly defined schema on top of this. I don't yet see why having a dedicated IndexedRecord is technically required.
Secondly, divergence of a manual class is not a problem if we specify the version of the schema in a config file, and check during serialization that the fields match.
Serialization to JSON or XML is possible out of the box with a POJO or case class
As mentioned, that would require using reflection. Notice that the Widget class is a simple POJO - https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/test/java/io/confluent/kafka/serializers/KafkaAvroSerializerTest.java#L729
don't yet see why having a dedicated IndexedRecord is technically required
Exception is thrown if reflection is not used and class is not a subclass of IndexedRecord, which can be done from generated classes
I would like to send a generic
case class
to Kafka usingKafkaProducer
, using Avro serialization with the Confluent Schema Registry. The documented approach assumes that one wants to use reflection and avro4s to generate an Avro schema, which is then submitted to the registry.In our situation, we have a registry which already contains the appropriate schema. Although the fields are the same, this schema is not identical in its metadata (name, namespace, etc.). How can we serialize our model objects using schemas from the Schema Registry, and send them over Kafka?
Code samples here: https://stackoverflow.com/questions/67004284/sending-pojo-to-kafka-with-pre-defined-avro-schema-in-schema-registry
Since this seems like a straightforward approach to using Schema Registry, I am posting it here as an issue. Documenting this would likely help many developers to get up and running with SR.