Strech / avrora

A convenient Elixir library to work with Avro messages, schemas and Confluent® Schema Registry
https://hexdocs.pm/avrora
MIT License
98 stars 33 forks source link

schema evolution support #103

Closed sudeepdino008 closed 1 year ago

sudeepdino008 commented 1 year ago

Hi, this is a question around schema evolution support in avrora. I want to basically:

I tried the first one in erlavro, and it didn't seem to work there. Perhaps the schema evolution support isn't there in avrora as well given that erlavro doesn't have it? Is there a way to get around it?

I've added the issue in erlavro too: https://github.com/klarna/erlavro/issues/117

Strech commented 1 year ago

Hey @sudeepdino008 thanks for the question. From the top of my head it should work, but I might need to check it. I think if you introduce new schema and mark it as BACKWARD or BACKWARD_TRANSITIVE it should work because you will have to update consumers first and they will read the latest schema from the registry I guess.

Could you try to emulate it in that order to prove that update of the consumer didn't help:

  1. Register full version of the schema
  2. Use writer to write a message
  3. Use reader to read the message (should use the full schema)
  4. Register new version of the schema with fields removed
  5. Update (restart, or drop cache) of the reader
  6. Use writer to write a message (with new schema)
  7. Use reader from step 5 and read new message (should use new schema)

I think that's the way it should work and of course you will have to configure schema registry compatibility for the schema before registering new (or if it will be correct by default you are good).

And if it will not work – let's dig deeper

sudeepdino008 commented 1 year ago

Hi @Strech,

I've created this test - https://github.com/sudeepdino008/avrora/blob/master/test/avrora/schema/evolution_test.exs Let me know if I'm not thinking about it right. This should be possible to do, right?

  1) test from_json/2 schema evolution (Avrora.Schema.EvolutionTest)
     test/avrora/schema/evolution_test.exs:10
     ** (MatchError) no match of right hand side value: {:error, :schema_mismatch}
     code: {:ok, dpayload} = Avrora.Codec.Plain.decode(epayload, schema: newschema)
     stacktrace:
       test/avrora/schema/evolution_test.exs:27: (test)
sudeepdino008 commented 1 year ago

It it helps, my use case is that I'm updating the schema on the reader side (with backwards compatibility ensured -- so only changes like deleting field, or adding new fields with default value), and expect the reader to be able to read data which is avro encoded by an older schema.

sudeepdino008 commented 1 year ago

Ok, so I came to know about schema registries. Can you tell if I'm understanding this correctly?

the reader must know about the schema with which the writer encoded some data. You can't expect the reader to be able to use a latest schema which evolved in backwards-compatible way, to decode data which is being encoded by the reader using some earlier schema version.

Avro needs schema registries to support evolution decoding. The reader and writer both have access to such a registry, and "register" new schema versions while also maintaining a local cache. The encoding also adds a "schema version identifier", which is used on the reader side to figure out the schema version, and fetch it from cache/schema registry.

However, this means that the read can happen off the older schema version. If the evolved schema had a field deleted OR a field added with default value (since evolution is backwards compatible), this modification has to be done and maintained separately from the decoding process.
Strech commented 1 year ago

Yes, the writer will register (or you will register separately) new schema in the schema registry and it will be given an ID. Such ID is stored later in the binary message writer will generate/obtain.

When reader will try to decode such message it will check for the ID if schema registry is enabled.

But you can emulate evolution by scenario I presented to you when local cache is busted in order to obtain the latest version of the schema from local file.

Strech commented 1 year ago

You are welcome

sudeepdino008 commented 1 year ago

Thank you for the time :) I'm experimenting with the OCF format + hooks in avrora now to skip certain fields in latest schema (which is linked to the original usecase)