Closed donmendelson closed 6 years ago
Making enums required feels like a hack. All types should code and behave in a similar manner to respect the principle of least surprise. If they cannot be optional then they do not work with versioning.
We need to step back and look at all types with a lens of consistency. We also need to be clear in that are they really to behave like types, or are them macros, or maybe templates. Whatever the design decisions are best when clear and consistent.
If presence/nullValue in 2.0 RC1 are attributes of fields only (not types). It is not possible for an enum to inherit them.
I would also argue that enumerations are just dictionaries with a scalar index (0 means this, 1 means that, and so on) so leaving it with the application to define a "null index" (42 means unknown/absent) makes sense to me. There is no impact on message size as the underlying bytes are allocated anyway, and there is also no impact on performance as the null value (whatever it is), must still be populated.
Fields have a type. They need to work consistently. We have seen various issues with composites and var data not cleanly defining the concept of not present. 2.0 is an opportunity to fix some of the inconsistencies, without adding more.
If not adding an explicit marker for null on each field then the base primitive type needs to express the null value. This can be done by defaulting the NULL_VAL
in an enum, which can be overridden, and I would argue that on var data we use the length with the null value of the primitive type for the length to do similar. An empty string is not the same as the string not present.
With enums we need to consider versioning. Most people forget it until later. If an application level enum value represents not present then it need to be considered up front. Best to express a clear and unambiguous means of expressing presence for all fields, groups, and var data in a message by clear use of the underlying primitive types.
I worded my proposal as a recommendation to message designers, not a requirement. The null value feature would not be eliminated. The purpose is not a mechanical one, but rather semantic. What is the meaning of null, outside the finite list of valid values?
How does one code for null in a consistent way across all possible enum declarations? How does one code for a string not being present as different from empty? How does one code for not knowing the answer to something, i.e. being comfortable admitting the answer is not known verses expecting an answer that might be wrong?
I am still not clear on what you are suggesting.
Are you saying that even for an enumeration, the presence/nullValue should reside at the field level (consistent with the change you introduced for simple types)?
Example:
<enum name="t_enum_uint8" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="C8">2</validValue>
</enum>
...
<field name="f_enum_uint8_required" id="1" type="t_enum_uint8" />
<field name="f_enum_uint8_optional" id="2" type="t_enum_uint8" presence="optional" nullValue="255" />
<!--
<field name="this_would_be_illegal_due_to_collision" id="3" type="t_enum_uint8" presence="optional" nullValue="0" />
-->
Or are you saying that there should be a well-known null value as per the table here, and that we always use that value? Again, presence is defined at the field level.
<enum name="t_enum_uint8" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="C8">2</validValue>
<!--
<validValue name="THIS_WOULD_BE_ILLEGAL">255</validValue>
-->
</enum>
...
<field name="f_enum_uint8_required" id="1" type="t_enum_uint8" />
<field name="f_enum_uint8_optional" id="2" type="t_enum_uint8" presence="optional" />
Obviously in each case, some illegalities exist.
In each range a value is reserved to be used to signal null. For example the default behaviour would be the automatic generation like the following:
<enum name="t_enum_uint8" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="C8">2</validValue>
<!-- The following is automatically generated if not provided.
<validValue name="NULL_VAL">255</validValue>
-->
</enum>
Or to override it would be:
<enum name="t_enum_uint8" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="C8">2</validValue>
<validValue name="NULL_VAL">42</validValue>
</enum>
Presence applies on the field. If this was not automatically generated then if a new field is added in a later version of a message the code would not be backwards compatible due to the unknown null value declared later.
I'm fine with that approach. Just a couple of things to clarify for completeness:
What happens if an enum introduced in version X with no override (defaults to null value for the range) is redefined in version Y>X with a null value override? It might be a good idea to say that's a message design error (either override when first-introduced, or forever keep the default).
How do we define enum presence in composites? Currently:
<xs:complexType name="compositeDataType" mixed="true">
<xs:choice maxOccurs="unbounded">
<xs:element name="type" type="sbe:memberDataType"/>
<xs:element name="enum" type="sbe:enumType"/>
...
<xs:complexType name="enumType" mixed="true">
<xs:sequence>
<xs:element name="validValue" type="sbe:validValue" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="sbe:symbolicName_t" use="required"/>
<xs:attribute name="encodingType" type="sbe:symbolicName_t" use="required"/>
<xs:attribute name="description" type="xs:string"/>
<xs:attributeGroup ref="sbe:offsetAttributes"/>
<xs:attributeGroup ref="sbe:versionAttributes"/>
</xs:complexType>
If an enum is inside a composite, there is no way to change the presence to optional:
<composite name="t_TopLevelComposite">
<enum name="I_WANT_THIS_TO_BE_OPTIONAL" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="NYLL_VALUE">255</validValue>
</enum>
Personally I don't like the fact that composite is so "powerful". I would make a list of fields and require that all complex types (composites/enums/sets) be top-level. A field seems better-suited anyway now that nullValue/presence/minValue/maxValue live there...
Above would be written as:
<enum name="MY_ENUM" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="NYLL_VALUE">255</validValue>
</enum>
<composite name="t_TopLevelComposite">
<field name="optional_field" type="MY_ENUM" presence="optional" />
Obviously making "presence" available on enumerations inside a composite would also work, but I feel like a composite is more naturally expressed using fields now that all attributes are no longer associated with the simple type.
EDIT: using fields there gets rid of the need for references/enums/sets/nested-composites... It would be similar to group/message. In fact: a composite is a repeating group with fixed count=1 (so the length need not be encoded).
Regarding (2): the null value and the presence are not equivalent concepts. We are saying that enumerations have a reserved null value (that can be overriden), but unless I declare a field as optional, that null value should never be used.
Assume an enumeration called "t_enum_uint8" that uses the default 255 for NULL_VALUE. Then for the two fields below:
<field name="f_enum_uint8_required" id="1" type="t_enum_uint8" />
<field name="f_enum_uint8_optional" id="2" type="t_enum_uint8" presence="optional" />
...sending 255 in field "f_enum_uint8_required" is illegal and applications should never do it (or refuse to interpret it) as it is a required field (the default presence). The value 255 is only legal for field "f_enum_uint8_optional".
So my question stands: if an enumeration is defined inline within a composite, there is no "presence" attribute for it (as opposed to when it is referenced from a message/group field). What is the default presence for enumerations in composites? Is that enumeration optional or required?
Currently this:
<composite name="t_TopLevelComposite">
<enum name="t_enum_in_composite_uint8" encodingType="uint8">
<validValue name="A8">0</validValue>
<validValue name="B8">1</validValue>
<validValue name="C8">2</validValue>
</enum>
<composite/>
Generates this in SBE tool:
public T_TopLevelCompositeEncoder t_enum_in_composite_uint8(final T_enum_in_composite_uint8 value)
{
buffer.putByte(offset + 2, (byte)value.value());
return this;
}
// ... where:
public enum T_enum_in_composite_uint8
{
A8((short)0),
B8((short)1),
C8((short)3),
/**
* To be used to represent not present or null.
*/
NULL_VAL((short)255);
Is it legal for an application to send T_enum_in_composite_uint8.NULL_VAL
in the composite because enumerations are IMPLIED to have presence="optional" in composites? Or is it illegal because enumerations are IMPLIED to have presence="required" in composites?
Maybe think of this another way. The enum in a composite is a shorthand for not defining the enum as a top level type. On a project that exists for a while people may realise that the enum is useful outside the composite and can be used in a field so they extract it. Now the field can be optional and a null value is required. How would that be addressed?
I'm currently dealing with some mess resulting from refValue
on constant fields to enums that is not clean because consistent usage has not been sufficiently considered. The original idea of create a type for everything in SBE rather than have clean types and compose them in usage scenarios has been the biggest source of confusion and bugs in the implementation.
First to answer the scenario question: based on what we discussed so far, the null value is defined either way: it is either the range's default (0 for char, 255 for uint8, 65535 for uint16, or (if a NULL_VAL override was already present) the value of that override (we agreed it would be wrong to change it). So when you move the enum outside the composite, you retain that same null value.
Moving it outside the composite means that it must now be used via a <field>
tag in a message/group, OR via a reference in the composite (i.e. <ref name="whatever" value="name_of_enum_type">
).
When used via a field the "presence" attribute of each field determines whether the null value is legal for that field.
When used via a reference in a composite however, you have the same problem: refType does NOT define a presence attribute, so one can not determine whether in this case the enumeration is optional or required:
<xs:complexType name="refType" mixed="true">
<xs:attribute name="name" type="sbe:symbolicName_t" use="required"/>
<xs:attribute name="type" type="sbe:symbolicName_t" use="required"/>
<xs:attribute name="description" type="xs:string" />
<xs:attributeGroup ref="sbe:offsetAttributes"/>
</xs:complexType>
It is the same case as my original question which was: what was the presence of the enum BEFORE moving it outside the composite?
Also, you previously alluded that:
A composite needs to be treated as a whole and thus if it is optional then all the members, or at least the first member, should be optional.
This does not answer my question: For example, is a composite whose first member is an inline enumeration or a reference to an enumeration optional or required?
From the discussion so far, it seems to me this has not been defined. The safe thing would be to say it defaults to "optional"
Yes I am saying that if the first member of a composite is an enum then it must allow it to be optional to have the composite be treated as optional. This way the underlying primitive can be tested for the null value to determine this.
Satisfactory. So to recap, your suggestion for SBE 2.0 is:
Enumerations always have a null value that is defined by the validValue with the special name NULL_VAL. If this is absent then the default value for the encoding type is used: 0 for char, or the max value for uint scalars.
When an enumeration is used as the first member of a composite (either through a reference or inline) then it defaults to optional presence and there is no override. (EDIT: by extension, that composite is optional).
Also see my update in #84.
Withdrawing proposal
The current specification allows fields with an enum encoding type to be optional. The null value for the field would be the null indicator of the underlying primitive type, e.g. int or char. However, the semantics of nullness for an enumeration may be unclear. I suggest that we make a recommendation that: