eclipse-cyclonedds / cyclonedds-cxx

Other
94 stars 75 forks source link

Serialization of sequences in D_CDR2 #218

Closed ichernev closed 2 years ago

ichernev commented 2 years ago

Hello,

I have the following data types:

module HelloWorldData
{
  @appendable struct MsgIn {
    int32 a;
  };
  @appendable struct MsgOut {
    sequence<MsgIn> objects;
  };
};

And I'm sending an empty MsgOut over the wire:

0000   08 00 00 00 04 00 00 00 00 00 00 00               ............

This gets decoded fine on Cyclone, but RTI decodes it as 4 MsgIn objects in the sequence. On the other hand RTI sends the same empty MsgOut as:

0000   04 00 00 00 00 00 00 00                           ........

which crashes Cyclone (but is decoded fine by RTI): crash deserialization CHAN/TYPE failed (for reasons unknown)

I tried to follow the spec, and it looks like sequences are sent by first outputting number of elements, then the actual elements. It looks like Cyclone is also outputting the ID of the sequence field (as if doing the PL_CDR2 encoding which prefixes each field with it's id). Can you please investigate?

If I put a bunch of prints in the write function + streamer.position() (just before each call):

start_struct 0    // allocate 4 byte for the D header?
start_member 4  
start_conseq 4    // <---- I don't know what this does, but it uses 4 more bytes
write len 8       // the len is 4 bytes (it's zero)
finish_conseq 12 
finish_member 12
finish_struct 12
END 12
start_struct 0    // go back here to write the real value of the length header?
finish_struct 0
END 0

FYI:

template<typename T, std::enable_if_t<std::is_base_of<cdr_stream, T>::value, bool> = true >
bool write(T& streamer, const ::HelloWorldData::MsgOut& instance, entity_properties_t &props) {
  (void)instance;
  std::cerr << "start_struct " << streamer.position() << std::endl;
  if (!streamer.start_struct(props))
    return false;
  bool firstcall = true;
  while (auto &prop = streamer.next_entity(props, firstcall)) {
    switch (prop.m_id) {
      case 0:
      std::cerr << "start_member " << streamer.position() << std::endl;
      if (!streamer.start_member(prop))
        return false;
      std::cerr << "start_conseq " << streamer.position() << std::endl;
      if (!streamer.start_consecutive(false, false))
        return false;
      {
      uint32_t se_1 = uint32_t(instance.objects().size());
      std::cerr << "write len " << streamer.position() << std::endl;
      if (!write(streamer, se_1))
        return false;
      for (uint32_t i_1 = 0; i_1 < se_1; i_1++) {
      std::cerr << "write item " << streamer.position() << std::endl;
      if (!write(streamer, instance.objects()[i_1], prop))
        return false;
      }  //i_1
      }  //end sequence 1
      std::cerr << "finish_conseq " << streamer.position() << std::endl;
      if (!streamer.finish_consecutive())
        return false;
      std::cerr << "finish_member " << streamer.position() << std::endl;
      if (!streamer.finish_member(prop))
        return false;
      break;
    }
  }
  std::cerr << "finish_struct " << streamer.position() << std::endl;
  auto res = streamer.finish_struct(props);
  std::cerr << "END " << streamer.position() << std::endl;
  return res;
}

EDIT: It looks like you decided to prefix the whole array with a length header (similar to a struct). I don't think this is necessary. The array is either part of the fields the other party understands or is ignored (after the fold, sender appended, reader ignored). There is no benefit in D-HEADER-ing the array. The spec itself states put number of elements, then each element, it doesn't mention D-HEADER for the whole sequence.

from CDR formal 02-06-51 (I didn't find an "override" in the XTypes spec). Appendable just prefixes structs with DHEADER rest is PLAIN_CDRv2, which is like PLAIN_CDRv1 :)

15.3.2.5 Sequence
Sequences are encoded as an unsigned long value, followed by the elements of the
sequence. The initial unsigned long contains the number of elements in the sequence.
The elements of the sequence are encoded as specified for their type.
dpotman commented 2 years ago

The first 08 00 00 00 is the DHEADER for MsgOut, from rule 30 of 7.4.3.5.3 in the XTypes spec. The 04 00 00 00 is the DHEADER for the sequence, rule 12 (rule 11 is not used because of the non-primitive sequence element type). And finally 00 00 00 00 is the length of the sequence. So this seems to be the correct CDR (v2) according to the spec. I checked the Cyclone C serializer, and that generates the same CDR.

The crash on unexpected data (the CDR from RTI) should not happen of course, we'll look into that!

ichernev commented 2 years ago

The crash on unexpected data (the CDR from RTI) should not happen of course, we'll look into that!

It prints the error, the code doesn't crash. So it is actually OK.

I've raised the issue with RTI, lets see what they have to say.

Btw, the spec talks about sequences with certain extensibility, does this apply to the type inside the sequence, or the sequence itself?

# from comments after rule (13)
// Sequences with extensibility MUTABLE are not allowed. Treated as
// APPENDABLE.

Does that mean:

@mutable struct A { ... };
struct B { sequence<A> objects; };

is not allowed, because I have a sequence of mutable objects? Shouldn't this be reported in IDLC then (at least as a warning). I don't see how it's not allowed but at the same time coerced to appendable. So it is allowed, handling specified in the spec, just discouraged?

dpotman commented 2 years ago

Btw, the spec talks about sequences with certain extensibility, does this apply to the type inside the sequence, or the sequence itself?

# from comments after rule (13)
// Sequences with extensibility MUTABLE are not allowed. Treated as
// APPENDABLE.

Does that mean:

@mutable struct A { ... };
struct B { sequence<A> objects; };

is not allowed, because I have a sequence of mutable objects?

From figure 15 in the spec, I would expect that the extensibility mentioned in these comments applies to the collection itself, not to the element type. But according to that same figure 15 (and also table 12), the extensibility kind has no effect on collections (constraints {extensibility_kind = <Not Applicable>}). So I don't understand these comments in the serialization rules. See also this ticket: https://issues.omg.org/issues/DDSXTY14-25

@e-hndrks do you know why the extensibility of collections is mentioned in the comments in the serialization rules?

I don't think that collections with mutable element type are not allowed or discouraged, in the type object IDL this is also used (from https://www.omg.org/spec/DDS-XTypes/20190301/dds-xtypes_typeobject.idl)

    @extensibility(MUTABLE) @nested(FALSE)
    struct TypeInformation {
        @id(0x1001) TypeIdentifierWithDependencies minimal;
        @id(0x1002) TypeIdentifierWithDependencies complete;
    };
    typedef sequence<TypeInformation> TypeInformationSeq;
eboasson commented 2 years ago

I decided to go over the spec and the serialization machine as well — I never studied it anywhere near as well as @dpotman did so in a way it constitutes "a fresh pair of eyes". I come to the same conclusion as he and @reicheratwork did, assuming of course that the target is XCDR2.

That serializer machine description should've been an executable specification, not some pseudo-formal, ever-so-slightly-inconsistent piece of gobbledygook ...

7.2.4.2 Concept of Delimited Types Types with extensibility kind APPENDABLE are delimited if serialized with encoding version 2 (DELIMITED_CDR). See Clause 7.4.2. They are not delimited if serialized with encoding version 1.

7.4.2 Extended CDR Representation (encoding version 2) DELIMITED_CDR shall be used for types with extensibility kind APPENDABLE. It serializes a UINT32 delimiter header (DHEADER) before serializing the object using PLAIN_CDR2. The delimiter encodes the endianness and the length of the serialized object that follows.

@appendable struct MsgIn ...;
@appendable struct MsgOut {
  sequence<MsgIn> objects;
};

**Start:**

(1) XCDR << {O : TOP_LEVEL_TYPE} =
         XCDR
         << INIT( OFFSET=0, ORIGIN=0, CENDIAN=<E>, EVERSION=<eversion> )
         << { ENC_HEADER(<E>, <eversion>, O.type) : Byte[2] } << PUSH( EVERSION = <eversion> )
         << PUSH( MAXALIGN = MAXALIGN(<eversion>) )
         << PUSH( ORIGIN = 0 )
         << { <OPTIONS> : Byte[2] }
         << { O : AsNested(O.type) }

  where
  - <E> = little-endian, <eversion> = 2
  - therefore MAXALIGN = 4
  - <OPTIONS> assumed 0
  - 0
  - O is appendable => DELIMITED_CDR by table 35
  - EDIT: table 39 apparently incorrectly maps D_CDR_LE to 0x15 ...
          table 7.6.3.1.2 says 0x09 which apparently is correct ...

output generated:
- 00 09 00 00

**first applicable rule:**

note encountered while scanning for first applicable rule:

// Structures extensibility APPENDABLE handled by generic APPENDABLE rules:
// (29)-(30)

// Extensibility APPENDABLE (Collection or Aggregated types), version 2
// encoding
(30) XCDR[2] << {O : APPENDABLE_TYPE} =
             XCDR
             << { DHEADER(O) : UInt32 }
             << { O : AsFinal(O.type) }

output generated:
- DHEADER 1 -- for length of AsFinal(O.type)

**first applicable rule:**

// Structures with extensibility FINAL (version 1 and 2 encoding)
// FMMEBER can be NOPT_FMEMBER (18) or OPT_FMEMBER (19)
(17) XCDR << {O : FSTRUCT_TYPE} =
          XCDR
          << { O.member[i] : FMEMBER }*
  where
  - { O.member[i] : FMEMBER }* evaluates to
    O : sequence<MsgIn>

**first applicable rule:**

// Sequences (any extensibility) using version 2 encoding
(12) XCDR[2] << {O : SEQUENCE_TYPE} =
             XCDR
             << { DHEADER(O) : UINT32 }
             << { O.length : UINT32 }
             << { O[i] : O.element_type }*

there is a somewhat contradictory comment 

// Sequences with extensibility APPENDABLE use common APPENDABLE rules:
// (29)-(30)

but is placed later in the document and so irrelevant by section 7.4.3.5; and in any case
it would only result in a second DHEADER for the same object, which is so nonsensical that
it suggests an editing mistake.

output generated:
- DHEADER 2   -- for length of sequence<MsgIn>
- 00 00 00 00 -- for sequence length

that completes the serialization of the sequence
DHEADER 2 therefore becomes 04 00 00 00
that completes the serialization of the MsgOut
DHEADER 1 therefore becomes 08 00 00 00

so the result is
- encoding header: 00 09 00 00
- XCDR2 payload:   08 00 00 00 : 04 00 00 00 : 00 00 00 00
ichernev commented 2 years ago

@eboasson @dpotman thanks a lot for chiming in!

I also echo the sentiment about the executable state machine. The current section looks like it was written by a high-schooler on his lunch break.

I understand it's technically according to spec. Still, prepending every sequence with a DHEADER in v2 (and sometimes twice ;-)) seems a bit excessive, because I don't see what purpose it servers. I might have to raise this issue with the omg bug tracker. Apparently RTI decided to "simplify" this on their own, still awaiting their reply.

I haven't tried to cross-check with other implementations, if it is more of an unspoken standard would it make sense to add a flag to support it in Cyclone as well?

eboasson commented 2 years ago

Hi @ichernev, we have had the debate on the wisdom of prepending sequences with DHEADERs amongst ourselves a few times, and while it indeed does get done in cases where it makes no sense, it isn't as bad as it seems at first. Having a DHEADER for a sequence allows one to quickly skip over the sequence without having to inspect each element individually. That in turn is a useful feature if one wants to extract the value of a particular field from a serialised sample. I suspect that is why it was added; that's certainly what we use it for.

If you look at the rules with that in mind, we have the following four categories:

  1. Sequences of "primitive types" (int, float, bool, char, octet): these don't get one and obviously don't need it;
  2. Sequences of variable-sized types (e.g., strings, sequences, unions, @appendable structs, ...): these get one, and it is useful;
  3. Sequences of fixed-size-but-complex types: these get one, and it is arguably useful;
  4. Sequences of enum/bitmask: these also get one, but we haven't been able to find a use for it.

Case 3 is arguably useful because it avoids having to classify types as variable/fixed and so simplifies the scanning (especially in a dynamic serialiser), but that's not a fundamental change in the amount of work it takes to skip a sequence in CDR at run-time. In most cases I suspect sequences lengths are such that the overhead is not dominant.

Case 4 is probably a mistake in the spec (you only ever need to skip over the sequence to continue with the next field in the struct if you know the type, otherwise you'd be skipping to the end of the enclosing struct using that struct's DHEADER). Unfortunately it isn't the only mistake in the spec.

The reason XCDR2 with its DHEADER was introduced in the first place is that the serialization rules in the original XTypes spec made it impossible to deserialize extended nested structs. That more-or-less defeats the purpose of doing XTypes; clearly RTI has never been thinking straight when it comes to the details of XTypes.

So, generally speaking, the DHEADERs before an appendable struct are necessary. Within the possibilities of the spec having a DHEADER before the top-level struct is obviously of no value, there's never a need to skip past that sample and there is enough context to determine the size. It also wouldn't have been complicated to describe. Fortunately, the overhead is no more than 4 bytes.

Useless though it seems to me, I can think of one possible reason why RTI might've wanted the DHEADER to be present always rather than only for nested structs: their DATA_BATCH submessage https://github.com/wireshark/wireshark/blob/80cb8fbb1249c4230a266ea73229f4dabca9203d/epan/dissectors/packet-rtps.c#L11583. And yes, they're squatting in the OMG-reserved namespace for submessage identifiers instead of using the vendor-specific range: see https://github.com/boundary/wireshark/blob/07eade8124fd1d5386161591b52e177ee6ea849f/epan/dissectors/packet-rtps.h#L296 and section 9.4.5.1.1.

Come to think of it, I've never seen anything that suggests RTI cares at all about following the specifications, and yet they do most of the writing. There are not many reasons I can think of why a company would spend that much on writing specifications only to ignore them and none of them are pretty ...

They wrote the spec, they better stick to it.

eboasson commented 2 years ago

I did want to get independent confirmation, and I got it: a default-constructed sample of

  @appendable struct nested { long x; };
  @appendable @topic struct ou { sequence<nested> x; };

published by OpenDDS master yields the same as Cyclone DDS:

image

And those who know @mitza-oci know that he is very careful in reading the spec. So for me OpenDDS spitting out the same as Cyclone DDS removes the last shadow of doubt that remained.

mitza-oci commented 2 years ago

I haven't reviewed all of this, but we are interpreting the spec as having this change https://issues.omg.org/browse/DDSXTY14-32 (or https://issues.omg.org/issues/task-force/DDSXTY14#issue-46866 which doesn't require login) effectively as an erratum because the other option was to have the spec be self-contradictory and that's hard to implement.

ichernev commented 2 years ago

That in turn is a useful feature if one wants to extract the value of a particular field from a serialised sample

I also though about that, but when do you need to extract only a single field (or a sub-group) out of a message? When you're skipping a PL encoded message-field that you don't know about? Is there DDS interface that lets you do that? Isn't it better to not send it in the first place :)

The reason XCDR2 with its DHEADER was introduced in the first place is that the serialization rules in the original XTypes spec made it impossible to deserialize extended nested structs. That more-or-less defeats the purpose of doing XTypes; clearly RTI has never been thinking straight when it comes to the details of XTypes.

You mean for the external (topic) type, the size is evident from the UDP encosure, but for nested types, it has to be written somewhere (i.e in DHEADER). Yeah, that part is clear. How they missed it in the first iteration -- not so :)

RTI Rep:

After investigating this issue, we can confirm this is an issue on RTI Connext DDS that will be fixed in the next major release.

:tada:

Thank you all for the help, may the spec be with you ;-)