FIXTradingCommunity / fix-simple-binary-encoding

A FIX standard for binary message encoding
Other
253 stars 67 forks source link

handling of presence="optional" field when the type has presence="mandatory" #31

Closed 2pl closed 6 years ago

2pl commented 7 years ago

Hi,

The presence attribute is allowed at the field level and the type level

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="sbe.test"
                   id="1" version="1" semanticVersion="0.1"
                   description="Example schema"
                   byteOrder="littleEndian">
    <types>
        <type name="VehicleCode" primitiveType="char" length="6" characterEncoding="ASCII"/>
        <type name="SeatCount" primitiveType="uint8" presence="required" nullValue="255"/>
    </types>
    <sbe:message name="Car" id="1" description="Description of a basic Car">
        <field name="vehicleCode" id="1" type="VehicleCode"/>
        <field name="seats" id="2" type="SeatCount" presence="optional"/>
    </sbe:message>
    <sbe:message name="SuperCar" id="2" description="Description of a super Car">
        <field name="vehicleCode" id="1" type="VehicleCode"/>
        <field name="seats" id="2" type="SeatCount"/>
    </sbe:message>
</sbe:messageSchema>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="sbe.test"
                   id="1" version="1" semanticVersion="0.1"
                   description="Example schema"
                   byteOrder="littleEndian">
    <types>
        <type name="VehicleCode" primitiveType="char" length="6" characterEncoding="ASCII"/>
        <type name="SeatCount" primitiveType="uint8" presence="optional" nullValue="255"/>
    </types>
    <sbe:message name="Car" id="1" description="Description of a basic Car">
        <field name="vehicleCode" id="1" type="VehicleCode"/>
        <field name="seats" id="2" type="SeatCount" />
    </sbe:message>
    <sbe:message name="SuperCar" id="2" description="Description of a super Car">
        <field name="vehicleCode" id="1" type="VehicleCode"/>
        <field name="seats" id="2" type="SeatCount" presence="required"/>
    </sbe:message>
</sbe:messageSchema>

Thank you.

adkapur commented 7 years ago

From the first example the type itself is not correct since presence=required is mutually exclusive with nullValue=255 nevermind that the field itself is marked as presence=optional

<type name="SeatCount" primitiveType="uint8" presence="required" nullValue="255"/>`

<field name="seats" id="2" type="SeatCount" presence="optional"/>`

The correct way would be to use only the encoding type or field presence to indicate whether something is required or not and not to use both

<type name="SeatCount" primitiveType="uint8"/>`

<field name="seats" id="2" type="SeatCount" presence="optional" nullValue="255"/>`

Or

<type name="SeatCount" primitiveType="uint8" presence="optional" nullValue="255"/>

<field name="seats" id="2" type="SeatCount"/>`

In the second example the type is correct but setting presence=required means that the decoder will not be expecting a null value in there and that might cause an application level error

<type name="SeatCount" primitiveType="uint8" presence="optional" nullValue="255"/>

<field name="seats" id="2" type="SeatCount" presence="required"/>`

donmendelson commented 7 years ago

I agree with @adkapur that nullValue is incompatible with presence="required" and that you should use either the type or the field for presence but not both. The spec states that a mismatch of presence between a field and its type is an error.

If the attribute is specified on both a field and the encoding that it references, the values must be identical.

In other words, one does not override the other.

It is not necessary to create two types for optional and required presence. Simply create one type without any presence attribute, and apply presence to each field that references the type.

mjpt777 commented 7 years ago

I think it makes more sense if presence is only on the field or when nested inside a composite. A type should define a type and not how it is used. This could make the spec tighter.

donmendelson commented 7 years ago

@mjpt777 , I agree.

(Historical reason: originally the thought was that optional types reserved a null value but required types could use the full range. Hence two types. However, this was changed to make all numerical types use the same range. So it no longer makes sense to have two types for optional and required.)

2pl commented 7 years ago

Thank you all for your answers.

@adkapur your suggestion below is invalid,

<field name="seats" id="2" type="SeatCount" presence="optional" nullValue="255"/>

throws an error with the latest xsd:

Error - Line 13, 90: org.xml.sax.SAXParseException; lineNumber: 13; columnNumber: 90; cvc-complex-type.3.2.2: Attribute 'nullValue' is not allowed to appear in element 'field'.

@mjpt777, I agree having presence allowed only at the field level would make more sense, with nullValue being an attribute of the type.

@mjpt777, also noticed that real-logic sbe-tool would ignore (with a warning) nullValue attribute of a type unless it also has presence="optional", hence forcing to define a specific optional type if you need a custom nullValue:

WARNING: at nullValue set, but presence is not optional

@donmendelson, with the default presence being required, I find it somehow inconsistent that this is correct:

<type name="SeatCount" primitiveType="uint8"  nullValue="42"/>

but that is not:

<type name="SeatCount" primitiveType="uint8" presence="required" nullValue="42"/>

but I understand the history behind that.

adkapur commented 7 years ago

So just to clarify for example currently we tend to define two types for required and optional such as:

<type name="Int32" description="int32" primitiveType="int32"/> <type name="Int32NULL" presence="optional" nullValue="2147483647" primitiveType="int32"/>

And use these in different fields such as:

<field name="SecurityID" id="48" type="Int32" description="Security ID " offset="0" semanticType="int"/> <field name="BestStopQty" id="20010" type="Int32NULL" offset="13" semanticType="Qty"/>

So going forward it seems that it is better to define a single type as:

<type name="Int32" description="int32" primitiveType="int32" nullValue="2147483647"/>

And use this in the same two fields as follows:

<field name="SecurityID" id="48" type="Int32" description="Security ID " offset="0" semanticType="int"/> <field name="BestStopQty" id="20010" type="Int32" offset="13" semanticType="Qty" presence="optional"/>

Also minValue, maxValue, nullValue will continue to be associated with the type and not the field?

But it appears that real-logic does not support this yet since presence="optional" is not set explicitly for the type?

<type name="Int32" description="int32" primitiveType="int32" nullValue="2147483647"/>

mjpt777 commented 7 years ago

@adkapur If you think the Real Logic implementation violates the spec then please raise an issue on the repo and we will follow it up.

2pl commented 7 years ago

@donmendelson can you confirm one should be able to define a type with a custom nullValue, without explicitly setting the type as optional:

<type name="zeroIsNullInt32" description="int32 with zero as nullValue" primitiveType="int32" nullValue="0"/>
donmendelson commented 7 years ago

A table in the spec says that nullValue override is only valid if presence="optional".

2pl commented 7 years ago

@donmendelson that was my point: the specs allows nullValue override only for optional types. So your suggested approach above

Simply create one type without any presence attribute, and apply presence to each field that references the type.

only work as long as you are happy with the nullValue of the primitive type.

If you are not, you need to define your type optional, and you can't override as required at the field level => your only choice is the 'CME' approach ie defining two different types for optional and required.

donmendelson commented 7 years ago

Note: this enhancement changes the XML schema and therefore is a breaking change with version 1.0.

mjpt777 commented 7 years ago

Why are the presence attributes available on the encoded basic data type but not composite, enum, or set?

donmendelson commented 7 years ago

I am about to update the XML schema for this issue and #39.

Martin, you said earlier:

I think it makes more sense if presence is only on the field or when nested inside a composite. A type should define a type and not how it is used. This could make the spec tighter.

Is everyone in agreement?

mjpt777 commented 7 years ago

If these attributes are to be supported on the type then the field should override.

I believe my previous statement still stands but it does raise a bigger issue. Are composites really types or some approximation of a macro? If really types then an extended type should only be able to further specialise. For example, the type cannot be widened and the min or max range should be within the range of the base type. The base should have core set of attributes which the extension/usage has as a minimum. What does a version attribute mean on a type within a composite?

Composition and consistency is weak with SBE. This can be seen in the types but also in how repeating groups should just be a recursive structure which is identical to a message. The more consistent this is then the cleaner the implementation and easier it will be to use and comprehend.

adkapur commented 7 years ago

Just to clarify what was the outcome? Was presence removed from the data type and is it now only on the field? Looking at --> https://github.com/FIXTradingCommunity/fix-simple-binary-encoding/pull/57/files it seems that presence was added to the composite data type

donmendelson commented 7 years ago

The compositeDataType complex type does not contain presenceAttributes. However, when a simple type is contained by a composite, it does have presence. This is not a change from from v1.0. The change is for a stand-alone simple type element, which does not have presence. As before, a field has presence.

adkapur commented 7 years ago

Okay I see so do you mean that something like this is no longer valid then?

<type description="uInt64NULL" primitiveType="uint64" presence="optional" name="uInt64NULL" nullValue="18446744073709551615"/>

Should this instead be uInt64 with the particular field being presence=optional and with nulValue=18446744073709551615?

<type description="uInt64" primitiveType="uint64" name="uInt64"/>

donmendelson commented 7 years ago

Correct, presence is an attribute of a field, not its wire format. One impact is that an optional and required field can share the same type.

RFrenkel commented 7 years ago

I am unclear on how we can still specify a constant element or an optional element of the composite. Let's say MaturityMonthYear is a required field in Security Definition template. However its elements Week and Day are optional. Same question for enumType or BitsetType.

I agree with Martin mjpt777 that in cases when field level presence is sepcified it must either be used in complement or overriding a binary type presence. Even for the simple types. Also why would field level presence support exclude binary type presence support? I think standard should allow for both and the combination of.

donmendelson commented 7 years ago

Martin said:

I think it makes more sense if presence is only on the field or when nested inside a composite. A type should define a type and not how it is used. This could make the spec tighter.

That is what the current schema implements. (Not cast in stone yet, if someone has a better idea.)

The week and day members of composite MaturityMonthYear can be optional while month and year are required. Also, members of a composite can be constant.

adkapur commented 7 years ago

If we support presence for both type and field then it causes confusion such as how this thread started and to keep things consistent and simple we should just attach presence to the field since this will also allow use of custom null values and the vagaries of whether something is optional or required should not be dictated by the type

zpodlovics commented 6 years ago

Could somebody give me a definition when a type is equalivalent with an another type in SBE? Is there any model (or proof) for SBE correctness (the same bitstream will always mean the same and other properties) exists?

In my current model a type is defined by the it's habitants (eg.: finite set of values for primitive types) and shape (meaning). A nullable type could be created by creating a discriminated union of a non-nullable existing type habitants and a new null habitant (the null habitant must be distinct than any existing habitants in the non-nullable type). Two type are equivalent when both type have the same habitants and the same shape (each habitant one-by-one mapped to the same meaning).

A non-nullable type and the nullable type cannot be equivalent as they have different habitants. A custom null values are cannot be equivalent by non-nullable types as they have different habitants (one habitant removed from existing habitants, and a different one added) and different shape (null represented by different habitant).

Also for nullable types null is a perfectly legal habitant and null should be also available for optional fields. Types could be also refined by restricting it's habitants for example by defining min and max values (but cannot be widened - so new habitants cannot be added).

Let assume I have the following type and everything else (presence, nullvalue, etc) will be specified in the fields. How it supposed to share the same type?

<type description="uInt8" primitiveType="uint8" name="uInt8"/>

uInt8 type habitants: 0..254

<field name="Field1" id="1" type="uInt8" description="Field1" offset="0" semanticType="int"/>

the Field1 habitants will be: 0..254

<field name="Field2" presence="optional" id="2" type="uInt8" description="Field2" offset="1" semanticType="int" />

the Field2 habitants will be: 0..254 or 255 as null

<field name="Field3" presence="optional" nullValue="0" id="3" type="uInt8" description="Field3" offset="2" semanticType="int" />

the Field3 habitants will be: 1..254 or 0 as null

<field name="Field4" presence="optional" nullValue="1" id="4" type="uInt8" description="Field4" offset="3" semanticType="int" />

the Field4 habitants will be: 0,2..254 or 1 as null

<field name="Field5" minValue="0" maxValue="7" id="5" type="uInt8" description="Field5" offset="4" semanticType="int" />

the Field5 habitants will be: 0..7

<field name="Field6" presence="optional" minValue="0" maxValue="7" nullValue="255" id="6" type="uInt8" description="Field6" offset="5" semanticType="int" />

the Field6 habitants will be: 0..7 or 255 as null

<field name="Field7" presence="optional" minValue="1" maxValue="8" nullValue="0" id="6" type="uInt8" description="Field7" offset="6" semanticType="int" />

the Field7 habitants will be: 1..8 or 0 as null

Without the custom nullValue the minValue and maxValue acts like a type refinement. SemanticType also act like a type refinement. However custom nullValue is not a type refinement (different than restricting only the habitants) instead it will always create new type. I guess the version could be also modelled as type refinement if the most recent schema version is used as a baseline and the earlier versions as refinements also Group could be modelled as refinement on length.

This type equivalence definition could be also used for enums and sets and for composing types from existing types. For example a composite type will be equivalent when the composite type have the same shape (meaning) and every type within the composite are equivalent (recursively).

Raw (opaque) data is not really an sbe uint8 type, because it has different habitants (0.. 255) than sbe uint8. For example here is a simple fixed length encoded uuid. How the optional supposed to work here?

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<messageSchema package="SBE tests"
               id="100000"
               semanticVersion="5.2"
               description="Unit Test"
               byteOrder="littleEndian">
    <types>
        <type name="uuid" primitiveType="uint8" length="16" description="RFC 4122 compliant UUID"/>
        <composite name="messageHeader" description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
    </types>
    <message name="TestMessage100001" id="100001" description="TestMessage" blockLength="16">
        <field name="Tag100001" id="100001" type="uuid" presence="optional"/>
    </message>
</messageSchema>

Null and optional are not mutually exclusive nor equivalent: https://developer.atlassian.com/blog/2015/08/optional-broken/

donmendelson commented 6 years ago

@zpodlovics, you have raised many issues, so I can't give a complete answer to your discussion, but here are a few points:

the same bitstream will always mean the same and other properties

SBE does not make a guarantee that identical bitstreams have the same meaning. SBE cannot be decoded without access to metadata (message template and incorporated types) not sent on the wire. Even when representing the same primitive types, they may have very different meanings at the application layer. We only give a hint to meaning with semanticType attribute.

Agreed that optionality, presence, and nullness are not identical concepts. In SBE, a value is always present for every field to support deterministic message length (unlike FIX tag value encoding in which absence represents nullness). A special value is reserved to indicate that a value is undefined for a field. I find ISO 11404 General-Purpose Datatypes useful. It states:

Optional is a generator which effectively adds the "nil" value to the value space of a base datatype.

That is, an optional operation renders a nullable datatype from a base datatype.

Since the value space of an opaque datatype is unknown to an SBE decoder by definition of opaque, it is not possible for it to distinguish a null value from non-null. The values must be passed up to a layer at which the encoding is not opaque.