edx / edx-arch-experiments

A plugin to include applications under development by the architecture team at edx
GNU Affero General Public License v3.0
0 stars 3 forks source link

[Evolution] Discovery: Schema Evolution #53

Open robrap opened 2 years ago

robrap commented 2 years ago

Important: It is possible we should implement https://github.com/edx/edx-arch-experiments/issues/53#issuecomment-1382159212 to protect against unexpected schema evolution, before completing this larger effort to ensure it can be done safely.

Note: Although this work is important, it is being deferred until we have the problem of having events to evolve. Our use of a schema registry enables this work, but this ticket is for getting into the nuts and bolts of proper configuration, processes, testing, docs, etc. to make this all work.

The following are a list of questions around best practices for Schema Evolution.

A/C

Open Questions:

To be confirmed: Confluent’s Schema Registry will not necessarily tell you or warn if you change the event schema for a topic. It will just evolve the schema and move on. This opens the possibility of error if someone accidentally updates a schema on the producing side without a corresponding update on the consumer side.

Note: This ticket was copied/moved from original private ticket: https://2u-internal.atlassian.net/browse/ARCHBOM-2013.

robrap commented 1 year ago
  1. There is this ADR: https://github.com/openedx/openedx-events/blob/main/docs/decisions/0006-event-schema-serialization-and-evolution.rst. It selects “Forward” or “Forward Transitive”, but is outdated, because it was written before we had optional fields. It should probably be updated, and I think we may switch to "Full Transitive", which is documented as a future desire.
  2. How will we transition configs?
robrap commented 1 year ago

When trying to remove an optional field from a data class, we started getting an error. See this discussion for more details: https://github.com/openedx/openedx-events/pull/131#discussion_r1000525070. We'll need some sort of workaround for this.

FYI: @rgraber: Feel free to provide any additional details if you wish. Thanks.

robrap commented 1 year ago

[idea] Add snapshot unit test where we are testing Avro processing for all events. The idea would be to have a script (or something) that would run the schema generation for all event definitions, and would produce a datastructure with the schema as string for each event. Then, when testing the Avro serialization/deserialization process, we'd also compare against this snapshot. If the snapshot is changing in a known way for a known reason, then the snapshot could simply be updated. If not, we caught some undesired change, which may or may not be related to an unexpected schema evolution.

Note that the comments for that test should also direct users back to this ticket while discovery still has not been completed.