Closed hariso closed 3 months ago
Our team discussed this today. Here are the solutions we discussed and the conclusion.
TL;DR
Conduit will make sure all schema names are unique by adding the connector ID and/or the pipeline ID to the name that a connector developer provides (which is going to be the collection name in most cases).
Long version:
Possible solutions
Discussion:
Make every name unique 1a. Connector developers provide a name. Conduit "makes it unique" by adding a prefix/suffix. Pros: makes the connector code a bit more clear (by showing what a schema is referring to) Cons: The actual name is different. The original name is not valid anymore.
1b. Connector developers don't provide a name. Conduit generates a random/unique name. Pros: Simple implementation. Cons: The schema registry internally is not well organized. This can done in a limited way by having structured names (e.g. pipeline ID + connector ID + schema name). The actual name is different. The original name is not valid anymore.
Use "namespaces" (each connector gets one) Confluent's SR has schema contexts. Works more or less like a prefix.
Pros: intuitive way to organize schemas, easier cleanup Cons: the franz-go client doesn't support contexts as "first class citizens". What CURRENTLY can be done is to change the base URL, but that would mean one client per connector. We might also want to change the client to support schema contexts.
Conclusion
We're choosing 1a for the following reasons:
While it does require some care on a connector developer's behalf, because the actual schema name is different, it's still not a big problem, because the parameter name and docs will call it out.
The mentioned solution relies on a connector being able to identify themselves (the combination of the pipeline/connector ID and the name that a developer provided guarantees schema subject uniqueness). Tokens can be used for that. Lovro wrote down some thoughts how to do that: https://github.com/ConduitIO/conduit/pull/1701#discussion_r1679780355-
@lovromazgon and I were discussing the implementation of this. There are a few points:
context
instead of prefix
since we plan to organize schemas into contexts in future.
Part of #1560.
Currently, our API allows connector developers to specify a schema name to be used. Given that Conduit's schema registry is shared by multiple connectors and pipelines, we need to handle name conflicts.
In other words: different connectors should be allowed to use same schema names, but that shouldn't have any side effects (such as one connector modify the schema from another connector).
This might be useful: Use Schema Contexts in Confluent Platform.
Pull requests:
1718