Open RobinGoussey opened 3 years ago
I really like this idea. I will definitely take this on in the future; though I'm not sure exactly when I'd have time to take it on. I could see it happening within the next couple of months.
I'm very much interested into this too. I'm ready to help with this.
Best, Jerome
I love this idea and this would be an awesome feature to have. I think it fits in great with the other features of kafka-gitops. I've got some availability coming up and would be able to help out as well.
@jrevillard I'd be happy to have your help as well! With a feature this big, I'd like to do a bit of planning & outlining before we get started on the code. I'd like to make a few examples of how to structure the YAML and discuss.
Hello,
I just seen that you have a first implementation @Twb3 ! https://github.com/Twb3/kafka-gitops/commit/50fa5ccb54ec77ee618d4152436a66804783fa9c
What's the status ? Do you need help ?
Best, Jerome
Looking forward to having this feature @Twb3!
Hey guys sorry I did not see this earlier. I did a quick POC for myself to see what's possible. I think I've got it mostly nailed down. I hope to propose a file structure soon. Just need to write it up
https://github.com/Twb3/kafka-gitops/commit/50fa5ccb54ec77ee618d4152436a66804783fa9c
schemas:
order-value:
type: Avro
file: order-schema.avsc
order-2-value:
type: Avro
file: order-schema.avsc
shipment-value:
type: Avro
file: shipment-schema.avsc
references:
- name: order-value
subject: order-value
version: 1
Each schema entry above is the name of the subject to be registered in the Schema Registry. I did this to keep it similar to how topic and service entries are the name of the resource to be created.
Type is self-explanatory although this POC is restricted to only Avro, because I don't know how to parse PROTOBUF yet. (I couldn't find good examples in the confluent schema registry client either)
File is a reference to the schema file located at SCHEMA_DIRECTORY
.
Config is handled via environment variables:
SCHEMA_REGISTRY_SASL_JAAS_USERNAME
\
SCHEMA_REGISTRY_SASL_JAAS_PASSWORD
\
SCHEMA_REGISTRY_URL
(default is http://localhost:8081
)\
SCHEMA_DIRECTORY
(absolute path to directory of schema files) (default is System.getProperty("user.dir")
)
Login module is currently hardcoded to org.apache.kafka.common.security.plain.PlainLoginModule
To know if a schema needs to be updated, I am parsing the schema file and generating a diff using zjsonpatch. I chose to use zjsonpatch because it returns differences by json node rather than the entire file. For example, the content of your schema could be identical, but you could have rearranged the order of nodes. The more I think about this while I type it makes me think it's not necessary. Ultimately, this part still needs work.
Schema Registry allows us to soft-delete and permanently delete. I think we would want to always permanently delete since we want our state file to represent exactly what is deployed. This is how my code currently works. I believe this also deletes all versions.
For validating schemas I did more than just validate the yaml is valid. I check that the schema file exists at SCHEMA_DIRECTORY
and I use methods from Confluent's Schema Registry Client to validate that the Avro schemas can be parsed. As a result of this, when you validate a schema with references it will actually make a call to your schema registry to validate that the schema you want to reference exists. This may go too far for validation. Just need some input.
Nice. Avro is a very good start. Just make sure that this also will work against schema registry in confluent cloud. When can we start testing? :)
Dear @Twb3,
This seems really promising thanks !
Yes Avro is a good start and the final goal would be to support: Thrift, Protocol Buffers, and JSON Schema. You say that you use the Confluent's Schema Registry Client to validate that the Avro schemas, so I think that this library would be capable of validate the other types isn't it ?
Concerning config, I could contribute with Kerberos as I will need it :-)
Best, Jerome
@Twb3 How is this feat going? Is it something that is stable enough to start using? I am very eager to have this as soon as possible.
Hi @Twb3 @HSA72 @devshawn ,
I don't know if you were aware of this: https://github.com/domnikl/schema-registry-gitops
There is one thing which is complicated for me to answer which is: how to deal with schema ids and versions ? Indeed, those IDs are generated server side and are used by the Kafka clients to identify the good schema. This means that there is no way to ensure the a schema will have the good ID/version and therefore, kafka-gitops cannot be the source of trust for this isn't it ?
As promised, you can find more than a POC implementation in #76 !
Please comment, improve etc...
@HSA72 I apologize for not following up on this sooner. I have not had the opportunity recently to dedicate time to this feature.
@jrevillard Thanks for posting that link to the schema registry gitops implementation! Looks promising.
As promised, you can find more than a POC implementation in #76 !
Please comment, improve etc...
Nice I will take a look!
@jrevillard I wasn't aware of that project - pretty nice. I still like our approach of putting it into this tool. Maybe their owner would like to help contribute as well?
I'll let you and @Twb3 take the lead on this and then give suggestions and take a look at the POC shortly.
Hi,
Right now there is nothing like this for schema management. It might be useful to also allow/declare what topics/subjects use what schemas:
This would allow the topic state, and schema state to be managed by one tool/file. java -jar kafka-schema-gitops-1.0-SNAPSHOT-jar-with-dependencies.jar -i validate # or execute