Closed balexander closed 5 years ago
Why would you not want to post the schema to the registry? If you push a schema which is already in the registry then the registry just sends back an id without registering the schema under a new version
In which scenario would you prefer to throw an error instead of registering a new schema/version? If it's something very specific that your producer should only produce a type of message if someone has already done it once then maybe it's something that should be checked in that code.
I am just trying to understand the use case, not 100% against going forward with this.
Agree with @bencebalogh. The way I see it and taking in count the name of this project, it was meant for interacting with Confluent Schema Registry and using the schemas stored there. If I wouldn't like to store the schema in the registry what's the point of using the library, I would just use an Avro package (avsc)
The big picture is that you might want your schemas to go through a review process before they are registered, especially when the schemas are going to be shared throughout an organization. For example:
Scenario: You are working at a company with many engineering teams and an analytics team. You want the analytics team to review new schemas before they are registered. Tagging an analyst in a large PR on some codebase he or she has never worked on makes for less timely reviews. The team's solution is to create a schema registry repo where new schemas can be reviewed by themselves. If approved, they are automatically sent to the schema registry for creation by a git action.
The ability to restrict the creation of new schemas is available in other libraries similar to this one (ie, https://github.com/flix-tech/schema-registry-php-client) so it isn't an idea that is totally from left field.
In your scenario:
I think the confusion is caused by pushing an already existing schema into registry: if a schema is already there the registry won't do anything just return the already stored schema's id.
Your scenario's example with the following schema:
{
"type": "record",
"name": "User",
"namespace": "com.example.avro",
"fields": [
{
"name": "id",
"type": "int"
},
{
"name": "username",
"type": "string"
}
]
}
landoop/fast-data-dev
containerusers-value
:
root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
> --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' \
> http://localhost:8081/subjects/users-value/versions
{"id":1}
users-value
:
root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
> --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' \
> http://localhost:8081/subjects/users-value/versions
{"id":1}
registrations
. The library will perform a POST for subject registrations-value
this time, but you can see the registry sees it's the same schema and returns the same schema, still does not register a new one:
root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' http://localhost:8081/subjects/registrations-value/versions
{"id":1}
With the above I don't see why we'd not POST a schema to the registry for the first time after start as if the schema is there it'll be a noop. Do you agree or is there a case when it'd be desired to not perform a POST?
closing this until further discussion
@bencebalogh Apologies for the late reply.
You are correct about some confusion I had. I see how the caching works and understand that a new schema won't be posted as your outlined above. However, I think I have slightly misstated our concerns.
We want tight control over what makes it into the schema registry. We want it to be a source of truth for which schemas are actually used or have been used. In the past, using a different registry, we ended up with a bunch of schemas that were registered, not used, and not removed.
Also, if someone is going to add a schema we want to make sure they go through the correct channels. We'd like to avoid a scenario where some new, un-approved, schema is registered. That could cause confusion.
Ah okay, that makes sense.
Wouldn't something like setting up ACLs be better? If someone forgets to set true
for this flag it'll still publish a new version, if the app is not meant to.
This adds an options object to
schemas()
allowing the user to decide if they want to allow new schemas to be pushed by default or not.