FIXTradingCommunity / fix-simple-binary-encoding

A FIX standard for binary message encoding
Other
261 stars 69 forks source link

How extension mechanism is supposed to work when dimension type is unknown? #147

Open OleksandrKvl opened 2 years ago

OleksandrKvl commented 2 years ago

Extension mechanism is based on required blockLength and optional numGroups and numVarDataFields. The latter two tell how many groups/var-data fields we have but they tell nothing about their dimension types so I don't understand how such extension can be handled. Consider the first version of the message:

<field name="field1" id="1" type="uint32"/>
<group name="group1" id="10" dimensionType="groupSizeEncoding">
    <field name="field1" id="11" type="uint32"/>
</group>

And the second version which adds another group:

<field name="field1" id="1" type="uint32"/>
<group name="group1" id="2" dimensionType="groupSizeEncoding">
    <field name="field1" id="3" type="uint32"/>
</group>
<!-- new group with different dimensionType -->
<group name="group2" id="4" dimensionType="groupSizeEncoding2">
    <field name="field1" id="5" type="uint32"/>
</group>

The reader of the new version will see numGroups set to 2 but that group has different dimension type and it's not possible to correctly handle it without its definition. The same happens with var-data fields. New ones can have different type which means different length type so it's not possible to correctly handle them.

For groups I can imagine potential "fix" in form of saying that since group dimension is unknown it should be treated as default groupSizeEncoding so only groups with this dimensionType can be safely added to the message. But the standard says nothing about default type for var-data encoding. Can you please clarify how it's supposed to work?

donmendelson commented 2 years ago

A decoder generated from an earlier version of a message schema cannot decode the new group anyway. The downside is that it cannot correctly determine the message length by walking the blocks. For that purpose, a framing header such as Simple Open Framing Header or WebSocket Protocol is recommended.

The extension mechanism section already states the constraint: "Message header encoding cannot change." This should be amended to say "Message header encoding and group dimension of existing repeating groups cannot change."

OleksandrKvl commented 2 years ago

@donmendelson This section says:

Correct me if I'm wrong, the above is only partially true:

A decoder generated from an earlier version of a message schema cannot decode the new group anyway.

Well, if group header and var-data length field types are known (for example defaulted) and message/group header contains numGroups and numVarDataFields, it should be possible because you would have all the information to calculate the size of each message level. I thought that was the original intent... Then what's the purpose of those numGroups and numVarDataFields on the wire?

OleksandrKvl commented 2 years ago

Actually, here's the direct quote from here:

Number of repeating groups and variable data. Message headers and repeating group dimensions carry a count of the number of repeating groups and a count of variable-length data fields on the wire. This supports a walk by a decoder of all the elements of a message, even when the decoder was built with an older version of a schema. As for added fixed-length fields, new repeating groups cannot be interpreted by the decoder, but it still can process the ones it knows, and it can correctly reach the end of a message.

donmendelson commented 2 years ago

@OleksandrKvl yes, it was the intent of the extension mechanism to be able to walk the blocks of a message even if the design of the blocks is unknown to the decoder. Previous proposals for version 2.0 did not anticipate that the repeating group dimension and length of variable-length fields could change for new blocks.

You proposed the possibility of a default dimension to be used for new blocks.

Another possibility would be that new blocks use the same dimension previous groups within a message. However, that would not cover the case of adding the first repeating group or variable-length field to a message. In that case there would have to some kind of default anyway.

I open to any other proposal that anyone would like to make.

OleksandrKvl commented 2 years ago

@donmendelson I thought that SBE V1 is kinda "final" and complete but it contains exactly the same extension logic which seems not possible to implement (I'm actually writing SBE parser, that's why I want to know the details).

Another possibility would be that new blocks use the same dimension previous groups within a message.

This also doesn't cover a (rare) case where groups have different dimension types. There's already messageSchema.headerType, possible solution might be to add messageSchema.groupSizeEncoding and messageSchema.varDataEncoding with the same defaults as their names. Similar attributes can be added to the <message> element to allow per-message customization. Then the only limitation to extension would be that new groups/var-data fields should use default types and of course numGroups/numVarDataFields must exist.

donmendelson commented 2 years ago

At high level, the intention of the extension mechanism was to support simple addition of new fields to a block without breaking old parsers. The proposal to allow addition of new repeating groups was initially rejected by the working group, and it was not included in SBE v1.0. The thought was that if a message is to be radically changed, then you should just issue a new template.

Since it was requested again by some users, the proposal was revived as SBE 2.x release candidate. I want to emphasize that a release candidate is provisional and subject to improvement. I would encourage @OleksandrKvl or any other interested parties to make proposals to resolve ambiguities in the spec. Any proposal entered as PR will be considered.

OleksandrKvl commented 2 years ago

@donmendelson before making such a proposal, I'd like to understand whether it's possible to change <data> attributes in a way I proposed in #148 (in particular providing just length type instead of composite) since proposal directly depends on it.