Closed databius closed 1 year ago
Thank you for creating this issue. Could you please provide an example?
Sure, let me share the context first.
I'm learning about Data Contracts. But that concept is quite new and lacks a universal tool for implementation in production. Some authors suggest handling Data Contracts with familiar tools. In my case, we are defining schemas using Avro and they are managed by the Schema Register. In addition, dbt is also used to check data quality.
IMO the data contract should be a single source of truth so it should be stored in a centralized repository and I am trying to find a common format for them.
Last week I tried with the Open Data Contract Standard. It's a great template but I believe Data Contract Specification is the best data contract format so far.
Back to the issue. In fact, the Schema Register does not store data, it only stores metadata containing the most important part of contract: the schema. One thing the Schema Registry does well is schema evolution, and I think we can reuse it. Actually, I still have to register the schemas (in the data contract) into the Schema Register because a lot of services depend on it.
In the above example, I think we should define the schema_registry in the server session. But I just reliazed that a schema can be registered to multiple topics, even multiple Schema Registry servers. If we repeated the schema (which might be thousands of lines) for each server, the contract would be very long.
So I propose new example:
schema:
type: avro
specification: |-
{
"type": "record",
"name": "SomethingShared",
"namespace": "com.databius.shared",
"fields": [
{
"name": "greeting",
"type": "string"
}
]
}
registry:
- name: dev
type: confluent_schema_registry
host: http://localhost:8081
subjects:
name: com.databius.shared.SomethingShared1-value
compatibility: FORWARD_TRANSITIVE
- name: prod
type: confluent_schema_registry
host: http://localhost:8081
subjects:
name: com.databius.shared.SomethingShared2-value
compatibility: FORWARD_TRANSITIVE
Note: I only have experience with Confluent Schema Registry.
The example is inspired by schema-registry-gitops.
I tried to implement a simple script to extract schema from the contract and register to Schema Registry using schema-registry-gitops. The results are quite promising.
Thanks for the clarification, I understand the flow and use case, which sound nice.
For simplicity, I would vote to keep the registry information as a custom field, but not include it to the general data contract specification, as the use case is quite limited to Confluent/Kafka...
If there are more votes to add registries, we can discuss to open the issue again.
Add new Confluent Schema Registry server. Example: