eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
888 stars 363 forks source link

Deep copying of constructed types including sequences #2123

Open okellogg opened 3 weeks ago

okellogg commented 3 weeks ago

I notice that idlc generates __alloc() and _free() functions for constructed types but does not generate a function for deep copying. Deep copy functions would make life easier when programming in C but are particularly interesting when making language bindings e.g. to C++ or Ada, where the assignment operator is required to have deep copy semantics. For example,

// IDL
   typedef sequence<float, 3> float_seq_t;
   struct A {
      float_seq_t fo;
   };
   struct B {
      A a;
   };
   struct C {
      B b;
   };
// C++ app code
   C c1 = [ ... some initialization or init function ...] , c2;
   c2 = c1;

How is this handled in CycloneDDS?

eboasson commented 3 weeks ago

Good question ... I'd say there is no good reason we don't generate deep copy functions for C. There is a reasonable simple way to do it if you don't mind an extra copy: that's through the serializer. It is more of a hack than a good solution of course, but it goes like:

https://github.com/eclipse-cyclonedds/cyclonedds/blob/d8cef891b9a27cc7bc52477e1c7b2c3f8edf81ba/src/core/ddsc/tests/psmx.c#L1418-L1433

Caching the result of dds_cdrstream_desc_from_topic_desc would make sense if you need to do it often. If you want you could use dds_stream_getsize_sample to first get the size of the serialized form and use dds_ostream_from_buffer instead of dds_ostream_init to prevent any allocations during serialization. Almost by definition, there'll be allocations during a deep copy, but it would eliminate a potential out-of-memory case. (If the input is bogus, the serializer can reject it, so you still have to be careful.)

eboasson commented 3 weeks ago

There is one nasty detail: booleans. This serializes to IDL, which says a boolean is represented as a byte that is either 0 or 1, and so that's what dds_stream_write_sample will output. It fits with a C99 bool, so for C it is fine. However, I think Ada represents true as 0xff ...

It used to pass them through unchanged, I fixed that fairly recently 😳 So maybe the trick won't work for you after all ...

okellogg commented 2 weeks ago

issue2123-add-deepcopyfunc-wip-20241102.diff.txt

I started adding the _copy functions, here is my snapshot. It is "old school", I have not used the topic descriptor stuff. The build currently fails on compiling the file Space.c generated from src/core/ddsc/tests/Space.idl, support for @optional / @external is not yet complete and I have not yet touched emit_union. I am a newbie with CycloneDDS and appreciate any feedback you may have, even at this early stage.

eboasson commented 2 weeks ago

Hi @okellogg! That's wonderful ☺️

I think it is fine to do it "old school": serialising and deserialising is necessarily slower then a dedicated copy routine, adding yet another operation to dds_cdrstream.c is not the most appealing thing to do, and where all the operations currently handled in dds_cdrstream.c have a use in type-generic code that doesn't seem to be the case for a deep-copy function for C types.

So, in principle, I think the approach you are taking is sensible.

I noticed a few things in the diff that you might want to do differently: firstly, there's an sprintf in there somewhere that is enitrely correct given the input, but with all the compiler warnings it'll lead to a warning. It probably makes more sense to stick to snprintf instead (and even then, modern gcc really likes the output buffer large enough to hold any value of the type ...)

Secondly — but this, I think is something that's simply a result of sharing an early-stage version, because it is so obvious — sequences can't in general be memcpy'd. A memcpy is of course totally sensible if the type doesn't contain any indirections, although it might not be all that much faster than copying field-by-field these days.

While I think it'd be nice to use struct/union assignment and then patch any strings, sequence buffers, optionals and externals, I think it might be harder to do that than it would be to do a field-by-field copy, simply because patching up pointers in nested types would bring its own complications.

Finally, I checked and for externals, the deep copy function is supposed to make a copy, so "external" and "optional" can be treated the same. Sequence buffers can be owned by the middleware or not and this is indicated in the _release flag: if it is false, the middleware may not reallocate/free; if it is true, the middleware may reallocate and must free. I would say the copy should have the _release flag set, rather than copied from the source, and I would think it makes the most sense to allocate _length elements and set _maximum = _length in the copy.

Those are just some things that came to mind ... hopefully one or two of those turn out to be useful 🙂

okellogg commented 2 weeks ago

Many thanks for your comments. I integrated them in a new branch (issue-2123-add-deepcopy-func) on my fork.