domnikl / schema-registry-gitops

Manage Confluent Schema Registry subjects through Infrastructure as code
Apache License 2.0
84 stars 11 forks source link

Automatically setting new version when updating dependencies #206

Open fzmoment opened 9 months ago

fzmoment commented 9 months ago

Hi, thanks for your work on this tool! It's great.

I wanted to check if there's a way to automatically set the version of a reference to latest? For example, let's say I have schemas A and B where A depends on B and I've made a change in B that I want to upload. It seems like I need to do this in two steps:

  1. apply the change to B, so that it has a new version number
  2. make a new change where I bump the reference version number that A has on B

instead, it would be nice if apply would automatically upload the new B schema and then update A to point to the latest B in one go. Is that currently possible?

I've tried setting the version to latest, which seems like it should be possible based on this error when I leave it blank:

2024-01-18 11:18:46.005 ERROR Could not parse Protobuf schema
java.lang.RuntimeException: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: The specified version '0' is not a valid version id. Allowed values are between [1, 2^31-1] and the string "latest"; error code: 42202

but then I get a different error:

2024-01-18 11:19:07.440 ERROR com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `int` from String "latest": not a valid `int` value
domnikl commented 9 months ago

Hi @fzmoment, glad that you find it great!

Right now it's not possible as the wrapper around the REST API does not allow for any string values, only Integers are allowed, thus we cannot just send latest and let the server handle it.

Fetching the latest version and replacing it when parsing the YAML does work though but it's not the same as it will only update the reference whenever you manually run it and not regard any changes from outside like automatically created/updated schemas from a producer. The latter being a bad pattern IMHO, but those cases exist as well.

We can certainly implement the first case, but to implement the second case, we'd need to implement our own API wrapper which I would rather avoid right now.

fzmoment commented 8 months ago

Got it, appreciate the response! We'll work around it :)

PSanetra commented 6 months ago

Hi @domnikl,

I am new to the schema-registry-gitops tool, but so far this is looking great! But I was wondering about exactly this issue as I would expect the gitops tooling to also handle schema references without the need to specify a version explicitly (if the referenced schema is managed by the same git repositoy of course), especially as the version is calculated by the schema registry and there is no way to define it declaratively. Previously I was using the confluent terraform provider, but it has a similar issue: https://github.com/confluentinc/terraform-provider-confluent/issues/207

I think it would be nice for the git ops tool to consider the state of the git repository to be the source of truth regarding the desired state of all contained schemas.

Therefore I think it should be possible to build a dependency graph between all Schemas in the repository, update all leaf-schemas first and/or fetch their version (even if they are not updated or even were reverted). Then update all dependent schemas, with exactly the schema version, which is checked into the Git repository.

The advantage of this approach is that this way it should also be possible to generate libraries for different languages from the same Git repository commit. (At least I hope this works, I still need to build some poc.) If I need to update the version of schemas in multiple commits I fear the risk to generate libraries with SpecificRecord types which might reference actually incompatible schemas. The confluent schema registry version might reference a legacy schema version, but the generated SpecificRecord type always references the version, which was checked into the repository.

I am thinking about contributing such a dependency-graph update strategy. What do you think about this approach and would you be open for such a contribution?

domnikl commented 6 months ago

Hi @PSanetra and sorry for the late reply, issue kinda got lost as it was already closed.

I totally agree with you that the state of the git repository should be the source of truth not only for schemas but also for their referenced dependencies. I just don't have the time to build it myself currently. But I'd be happy to review and merge a dependency graph update strategy if you would like to work on it.