FIXTradingCommunity / fix-orchestra

Machine readable rules of engagement
Apache License 2.0
71 stars 34 forks source link

[repository schema] Indication of null value in mapped data type #215

Open mkudukin opened 1 month ago

mkudukin commented 1 month ago

The proposal is to add ability to define the null value for a mapped data type as discussed in https://github.com/FIXTradingCommunity/fix-orchestra/discussions/197.

Issue

Many binary protocols are fixed-length, meaning every field must be included in the transmission, even if it's optional and not used in that specific message. Those protocols define a special value for each datatype, usually called null value. This null value indicates that an optional field is not being used. When the encoder or decoder encounters this null value, it treats the field as if it is not set.

Orchestra standard currently doesn't provide a way to map a datatype with its null value.

Proposal

We recommend adding nullValue attribute to the mappedDatatype element.

Example

<datatype name="u8">
    <mappedDatatype standard="ISO11404" base="integer" minInclusive="0" maxInclusive="255" nullValue="0xFF"/>
    <annotation>
        <documentation>8 bit Unsigned Integer, 0xFF in case of empty field</fixr:documentation>
    </annotation>
</datatype>
martinswanson commented 1 month ago

Why isn't this just part of the configuration for your encoder/decoder?

mkudukin commented 1 month ago

@martinswanson, the idea is to design a standard way to describe encoding-related properties of data types within the Orchestra Repository model.

martinswanson commented 1 month ago

If we take SBE as an example, null values and associated handling are already defined at the encoding layer. What is the reason we want to duplicate this configuration in Orchestra?

Screenshot 2024-07-29 at 15 09 03

(from https://www.fixtrading.org/packages/simple-binary-encoding-draft-standard-v1-0/)

I think Orchestra should focus on capturing the semantics, i.e. the logical (encoding independent) name of a field, its presence (required/optional), and mapping from an abstract datatype to encoding-level data types. But the details of how the encoding level datatype is implemented should be handled outside of Orchestra, in your encoder/decoder configuration.

I would also ask whether null values actually mean anything from a semantic perspective (what would a blank value in a required field actually mean?

mkudukin commented 1 month ago

If we take SBE as an example, null values and associated handling are already defined at the encoding layer. What is the reason we want to duplicate this configuration in Orchestra?

How the configuration is duplicated? Could you please share some details on how you use SBE together with Orchestra? Do you mean when mapping Orchestra data types to SBE ones?

But the details of how the encoding level datatype is implemented should be handled outside of Orchestra, in your encoder/decoder configuration.

While it's impossible to define every encoding detail within the Orchestra Repository, we can include common ones like null values, byte order, and padding associated with data types. This approach mirrors the structure of binary protocol specification documents, which often combine data type definitions with their corresponding encoding details. This proposal fulfills the goal of creating a machine-readable representation of these specifications.

By the way, @donmendelson mentioned this feature in another mappedDatatype improvement proposal.

I would also ask whether null values actually mean anything from a semantic perspective (what would a blank value in a required field actually mean?

Orchestra standard may follow the same approach as SBE here: both required and optional fields of the same data type have the same value space, regardless nullValue being set. This would mean that if the required field has nullValue on the wire, it must be handled as absence of the required field.

patricklucas commented 2 weeks ago

I would like to consider using extensions to specify properties like this as a workaround prior to codifying it into the Orchestra standard itself. This would serve to drive out any true gaps in the Orchestra standard as well as help us establish a convention for introducing new capabilities to Orchestra in a staged manner, enabling the community to take advantage of, evaluate, and iterate on features before they become "set in stone" in the standard.

We may even find that the extension approach is sufficient, and we can settle on a community convention for properties we find are applicable across many binary encoding systems, for instance.

Here are some ideas for how the null value for an Orchestra datatype for a particular binary encoding MyBinaryEncoding could be configured using an extension:

<fixr:mappedDatatype standard="MyBinaryEncoding" base="integer">
    <fixr:extension>
        <!-- Convention: the `binaryEncodingNullValue` extension takes a hex-encoded single byte value encoded as a string starting with `0x` -->
        <binaryEncodingNullValue>0xFF</binaryEncodingNullValue>
        <!-- Convention: the `binaryEncodingNullValue` extension could support different types -->
        <binaryEncodingNullValue hexValue="0xFF"/>
        <binaryEncodingNullValue intValue="255"/>
        <!-- Convention: the `orchestraBinaryEncoding` extension could hold various properties, agreed by the community -->
        <orchestraBinaryEncoding>
            <nullValue hexValue="0xFF"/>
            <!-- ... -->
        </orchestraBinaryEncoding>
    </fixr:extension>
</fixr:mappedDatatype>