OpenCyphal / specification

The Cyphal specification documents are maintained here.
https://opencyphal.org/specification
Creative Commons Attribution 4.0 International
41 stars 13 forks source link

Beta PR 2 of 2: delimited serialization #97

Closed pavel-kirienko closed 3 years ago

pavel-kirienko commented 4 years ago

Fixes https://github.com/UAVCAN/specification/issues/90

Context, motivation, discussion: https://forum.uavcan.org/t/data-type-extensibility-and-composition/827

Implementation: https://github.com/UAVCAN/pydsdl/pull/45; its documentation: https://pydsdl--45.org.readthedocs.build/en/45/


This is the last PR separating us from v1.0-beta; you will notice that I have renamed the document into "Specification v1.0-beta" in this branch already.

The changeset is large because it required updating many independent sections of the document. I recommend reviewing it by simply reading chapter 3 DSDL, without any regard for the diffs. Afterward, the second pass can be done by looking at the diffs only.

There are some inconsequential changes in the script render_dsdl.py that are safe to ignore. This script is seriously getting out of hand and I wish someone could rewrite it using a proper template engine like Jinja. I never expected it to grow so complex.

I also removed the term "data schema" from the spec because it did not reflect the reality accurately and was used incorrectly in several places. Instead, we now use "data type". This change made it painfully obvious that service types should not even belong in the category of serializable types because they are not serializable, but I suppose we'll not be changing this detail now unless someone is willing to volunteer.

pavel-kirienko commented 3 years ago

all serializable types are now alignas(8) extent types are always alignas(8) including primitive types (right?)

Composite types are alignas(8). Primitives and voids are alignas(1) and an array of T is alignas(T).

All types are automatically 32-bits bigger when nested inside of other types because there is an implicit 32-bit delimiter appended to them.

Not all types but only non-final composite types. Otherwise yes.

So if you mark your type @final (btw later I thought about saying "sealed" instead of "final" because it better describes the effect), you can nest it all you want and it wouldn't incur the 32-bit overhead.

thirtytwobits commented 3 years ago

later I thought about saying "sealed" instead of "final" because it better describes the effect

Ooh. I like "sealed" better especially since final is a C++ keyword that has a very different effect.

thirtytwobits commented 3 years ago

Okay, so I think the one sticking point for me is the default extent. The 1.5 value seems arbitrary. How did we arrive at this? Is there a precedence for a default memory reserve in other data interchange technologies?

Regardless, adding in some of the discussion here: https://forum.uavcan.org/t/data-type-extensibility-and-composition/827/9 might help orient people about why we are wasting their precious memory.

Also needed is a discussion of the effect of removing a field from a datatype that uses this default extent. By shrinking the serializable size of the type I'd also be shrinking the extent which seems problematic, right?

pavel-kirienko commented 3 years ago

Okay, so I think the one sticking point for me is the default extent. The 1.5 value seems arbitrary. How did we arrive at this? Is there a precedence for a default memory reserve in other data interchange technologies?

I couldn't locate similar practices anywhere, not even in ASN.1. It would appear that most standards have no problem with dynamic memory, which seems natural. The multiplier is, indeed, chosen arbitrarily, but it seems like a low-risk guess that can be revised easily if necessary.

Regardless, adding in some of the discussion here: https://forum.uavcan.org/t/data-type-extensibility-and-composition/827/9 might help orient people about why we are wasting their precious memory.

Can you list any specific points that should be transferred? I was consulting with my own notes from that thread and I thought that I have carried over everything worthwhile but maybe I missed something.

Also needed is a discussion of the effect of removing a field from a datatype that uses this default extent. By shrinking the serializable size of the type I'd also be shrinking the extent which seems problematic, right?

Yes, but I think this is covered adequately in 3.8.3.2, in the beginning here:

image

Do you think this needs to be expanded?

thirtytwobits commented 3 years ago

Regardless, adding in some of the discussion here: https://forum.uavcan.org/t/data-type-extensibility-and-composition/827/9 might help orient people about why we are wasting their precious memory.

Can you list any specific points that should be transferred? I was consulting with my own notes from that thread and I thought that I have carried over everything worthwhile but maybe I missed something.

Let's add a non-normative section that explains the 50% is arbitrarily chosen so nobody thinks that 2/3 is some magic number. So something like:

/begin{remark} Because of UAVCAN's commitment to determinism memory buffer allocation can become an issue. When using a flat encoding format with the implicit truncation rule, it is clear that the last defined fields are to be truncated out shall the buffer be too small to accommodate a received message. In a delimited (orthogonal) encoding, this behavior can no longer be relied on. See the rightmost figure:

image

Here, the field Y.c is truncated out even though one would expect the entire Y to be removed.

Conventional protocols like DDS manage this simply by delaying the memory requirement identification until runtime, which is unacceptable to UAVCAN. The solution for UAVCAN is to require implementations to reserve more memory than required by the data type definition unless it is @sealed (or unless the implementation does use dynamic memory allocation). The multiplier chosen to define the /emph{default} amount of reserve memory (i.e. the amount if a specific extent value is not provided) was chosen arbitrarily and is considered a common-sense default that is not optimized for any specific property of a system. This approach allows for limited extensibility while enabling deterministic memory requirements. /end{remark}

Also needed is a discussion of the effect of removing a field from a datatype that uses this default extent. By shrinking the serializable size of the type I'd also be shrinking the extent which seems problematic, right?

Do you think this needs to be expanded?

Perhaps we remind the user that, if they update a datatype by adding a field, they should also consider fixing the extent at the previously defined value.

pavel-kirienko commented 3 years ago

I have accepted your suggestions but not as-is. I did not include the diagram from that forum thread because at the point where we introduce the extent, it is not known anything about delimited serialization, so we can't just show the delimiter header on a diagram without explaining what it is. Also, the diagram falls short of the quality bar set for the Specification. So, instead, I just added a link to an existing diagram at the end of the Serialization section, where I also added a few extra sentences about @extent.

thirtytwobits commented 3 years ago

I'm approving. I will say for the record this change gives me some pause. Once you reason through it the rational becomes clear and it's a novel approach to force pre-allocating memory for future expansion on the system that is quite rational for real-time designs. My pause is that it is rather complicated by default which seems to go against the "U" in UAVCAN. I'm wondering if we are going to regret not making things sealed by default in the future more than we would have regretted not forcing users to consider extensibility by default.

pavel-kirienko commented 3 years ago

Let's say, "U" is for "as simple as it can be, but not simpler"