amazon-ion / ion-docs

Source for the GitHub Pages for Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
22 stars 22 forks source link

Make it possible to switch from a SID field names to a FlexSym field names part way through a struct #292

Closed popematt closed 2 months ago

popematt commented 7 months ago

It would make binary writer implementations a lot simpler if we could start a struct using SID field names, and switch over to FlexSym field names if and only if an inline field name is required.

It is awkward to start a struct in SID mode only to have a high-level add in a FlexSym later on. In this case, the writer must (a) switch the struct mode in the middle of the struct, (b) refuse to do anything and throw/panic/return Err back to the user, or (c) rewrite the entire struct in FlexSym mode. Option (b) is a bad user experience and will be the cause of bugs in other systems, and option (c) is potentially very expensive since this could require rewriting several kBs or MBs of data.

I propose that the reserved opcode C1, when used in the value position of a struct, causes the previous SID field name to be discarded, and causes the remainder of that struct to use FlexSym field names.

In order to streamline things a bit more, I propose that delimited structs also start with SIDs and can switch to FlexSyms in the same way.

If we do this, we may as well get rid of the FlexSym-specific op codes because if you're using inline symbol text, then the overhead of switching from SID field names to FlexSym field names is relatively small. This would free up opcodes D0-DF and FD to be used for other purposes.

popematt commented 4 months ago

Alternative solution

The original proposal requires 2 bytes to switch to FlexSym mode. One for a throwaway field name, and another for the op-code to switch. This alternative uses only one byte to switch to FlexSym mode.

The FlexInt 0 (literal 0x01) in the field name position is used to indicate that the struct is switching to FlexSym mode. The SID $0 can only be used for a field name when in FlexSym mode.