apache / fury

A blazingly fast multi-language serialization framework powered by JIT and zero-copy.
https://fury.apache.org/
Apache License 2.0
3.12k stars 248 forks source link

[Python] Schema evolution support for type backward/forward compatibility #1938

Open chaokunyang opened 2 weeks ago

chaokunyang commented 2 weeks ago

Feature Request

If schema evolution mode is enabled globally when creating fury, and enabled for current type, type meta will be written using one of the following mode. Which mode to use is configured when creating fury.

The normal mode and meta share mode will forbid streaming writing since it needs to look back for update the start offset after the whole object graph writing and meta collecting is finished. Only in this way we can ensure deserialization failure in meta share mode doesn't lost shared meta.

Type Def

Here we mainly describe the meta layout for schema evolution mode:

|      8 bytes meta header      |   variable bytes   |  variable bytes   | variable bytes |
+-------------------------------+--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header |  current type meta |  parent type meta |      ...       |

Type meta are encoded from parent type to leaf type, only type with serializable fields will be encoded.

Meta header

Meta header is a 64 bits number value encoded in little endian order.

Single layer type meta
| unsigned varint | var uint |  field info: variable bytes   | variable bytes  | ... |
+-----------------+----------+-------------------------------+-----------------+-----+
|   num_fields    | type id  | header + type id + field name | next field info | ... |

Field order are left as implementation details, which is not exposed to specification, the deserialization need to resort fields based on Fury field comparator. In this way, fury can compute statistics for field names or types and using a more compact encoding.

Other layers type meta

Same encoding algorithm as the previous layer.

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

1556