agile-lab-dev / darwin

Avro Schema Evolution made easy
Apache License 2.0
34 stars 10 forks source link

Add documentation about default fields/fingerprint issue #111

Closed duhizjame closed 8 months ago

duhizjame commented 1 year ago

Default field is omitted from the canonical form of the schema, leading to non-compatible schemas having the same fingerprint. We should provide a debugging tip for the user in case this breaks compatibility.

duhizjame commented 1 year ago

I cannot push to a branch, so I am copying the markdown here to be appended to the docs:

Notes on Avro Single-Object Encoding Specification

Canonical form ignores default fields

As specified in the background section, Darwin leverages the Avro Single-Object Encoding specification to allow the schema fingerprint to be stored along the avro data. In order to create the fingerprint, the schema is converted into its parsing-canonical form, which strips away all the fields that are not needed for reading/writing, such as doc, alias, comment. However, it will also remove the default field, allowing two schemas that are semantically different due to a default field to have the identical fingerprint. The default field is important for compatibility; it is useful to know this tip for debugging purposes in case of a broken compatibility on a subject.

Sources: