Add option to push new schemas and test

balexander commented 5 years ago

This adds an options object to schemas() allowing the user to decide if they want to allow new schemas to be pushed by default or not.

bencebalogh commented 5 years ago

Why would you not want to post the schema to the registry? If you push a schema which is already in the registry then the registry just sends back an id without registering the schema under a new version

In which scenario would you prefer to throw an error instead of registering a new schema/version? If it's something very specific that your producer should only produce a type of message if someone has already done it once then maybe it's something that should be checked in that code.

I am just trying to understand the use case, not 100% against going forward with this.

Lroca88 commented 5 years ago

Agree with @bencebalogh. The way I see it and taking in count the name of this project, it was meant for interacting with Confluent Schema Registry and using the schemas stored there. If I wouldn't like to store the schema in the registry what's the point of using the library, I would just use an Avro package (avsc)

balexander commented 5 years ago

The big picture is that you might want your schemas to go through a review process before they are registered, especially when the schemas are going to be shared throughout an organization. For example:

Scenario: You are working at a company with many engineering teams and an analytics team. You want the analytics team to review new schemas before they are registered. Tagging an analyst in a large PR on some codebase he or she has never worked on makes for less timely reviews. The team's solution is to create a schema registry repo where new schemas can be reviewed by themselves. If approved, they are automatically sent to the schema registry for creation by a git action.

The ability to restrict the creation of new schemas is available in other libraries similar to this one (ie, https://github.com/flix-tech/schema-registry-php-client) so it isn't an idea that is totally from left field.

bencebalogh commented 5 years ago

In your scenario:

you open a PR into the schema repo, it gets reviewed and merged
your git hook publishes the schema to the registry
you start using the schema in your app with this library
the library will perfrom a POST with the schema, the registry checks that the schema is already stored there and return its id - the registry won't save a new schema anywhere, the library caches the id and will use that from now on

I think the confusion is caused by pushing an already existing schema into registry: if a schema is already there the registry won't do anything just return the already stored schema's id.

Your scenario's example with the following schema:

{
  "type": "record",
  "name": "User",
  "namespace": "com.example.avro",
  "fields": [
    {
      "name": "id",
      "type": "int"
    },
    {
      "name": "username",
      "type": "string"
    }
  ]
}

I've started a landoop/fast-data-dev container

The git hook registers the new schema under subject users-value:

root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
> --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' \
> http://localhost:8081/subjects/users-value/versions
{"id":1}

Your app, using this library, produces its first message after startup, performing a POST request, for subject users-value:

root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
> --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' \
> http://localhost:8081/subjects/users-value/versions
{"id":1}

Let's say you've got another app, using the same schema with a different topic, called registrations. The library will perform a POST for subject registrations-value this time, but you can see the registry sees it's the same schema and returns the same schema, still does not register a new one:

root@fast-data-dev / $ curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"User\",\"namespace\":\"com.example.avro\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"username\",\"type\":\"string\"}]}"}' http://localhost:8081/subjects/registrations-value/versions
{"id":1}

With the above I don't see why we'd not POST a schema to the registry for the first time after start as if the schema is there it'll be a noop. Do you agree or is there a case when it'd be desired to not perform a POST?

bencebalogh commented 5 years ago

closing this until further discussion

balexander commented 5 years ago

@bencebalogh Apologies for the late reply.

You are correct about some confusion I had. I see how the caching works and understand that a new schema won't be posted as your outlined above. However, I think I have slightly misstated our concerns.

We want tight control over what makes it into the schema registry. We want it to be a source of truth for which schemas are actually used or have been used. In the past, using a different registry, we ended up with a bunch of schemas that were registered, not used, and not removed.

Also, if someone is going to add a schema we want to make sure they go through the correct channels. We'd like to avoid a scenario where some new, un-approved, schema is registered. That could cause confusion.

bencebalogh commented 5 years ago

Ah okay, that makes sense. Wouldn't something like setting up ACLs be better? If someone forgets to set true for this flag it'll still publish a new version, if the app is not meant to.

https://docs.confluent.io/current/confluent-security-plugins/schema-registry/authorization/sracl_authorizer.html

bencebalogh / avro-schema-registry

Add option to push new schemas and test #21