eclipse-cyclonedds / cyclonedds-python

Other
66 stars 47 forks source link

Instance Registration and Disposal fails #269

Open FirebarSim opened 2 months ago

FirebarSim commented 2 months ago

When calling register_instance and passing in a sample of a type with defined keys the returned handle is a constant per run (i.e. run 1 always returns 123456789, run 2 always returns 987654321).

When calling dispose_instance_handle(instance_handle) a sample dispose is sent but for a different key than that of any of the registered instances.

When calling dispose_instance(sample) a sample dispose is sent but for a different key than that of the sample.

The samples being registered are OMG OARIS.

eboasson commented 1 month ago

Hi @FirebarSim, this:

When calling register_instance and passing in a sample of a type with defined keys the returned handle is a constant per run (i.e. run 1 always returns 123456789, run 2 always returns 987654321).

fits with how Cyclone does things. Registering the instance is the main point at which it generates an instance handle, and they are generated by running a 64-bit cipher in counter mode with a random key, so to a good approximation each and every time you need an instance handle in whatever context, you get a new unique number within a single process and that number moreover is highly likely to be unique over the lifetime of the DDS system.

These observations:

When calling dispose_instance_handle(instance_handle) a sample dispose is sent but for a different key than that of any of the registered instances.

When calling dispose_instance(sample) a sample dispose is sent but for a different key than that of the sample.

do worry me a bit. I tried a simple experiment using the following:

from dataclasses import dataclass
from cyclonedds.domain import DomainParticipant
from cyclonedds.pub import DataWriter
from cyclonedds.sub import DataReader, InvalidSample
from cyclonedds.topic import Topic
from cyclonedds.core import Qos, Policy
import cyclonedds.idl as idl
from cyclonedds.idl.annotations import key
import cyclonedds.idl.types as types

from time import sleep

@dataclass
class S(idl.IdlStruct, typename="S"):
    s: str
    key("s")
    v: int

dp = DomainParticipant(0)
tp = Topic(dp, "S", S, qos=Qos(Policy.History.KeepAll))
dw = DataWriter(dp, tp)
dr = DataReader(dp, tp)

ih0 = dw.register_instance(S("aap", 0))
ih1 = dw.register_instance(S("noot", 0))
print(f"ih0={ih0} ih1={ih1}")

b = 0
while True:
    if b % 2 == 0:
        dw.write(S("aap", b))
        dw.write(S("noot", b))
    else:
        dw.dispose(S("aap", b))
        dw.dispose_instance_handle(ih1)
    xs = dr.take(10)
    for x in xs:
        if isinstance(x, InvalidSample):
            print("Took invsample ", x.key_sample, x.sample_info.instance_handle)
        else:
            print("Took ", x, x.sample_info.instance_handle)
    sleep(1)
    b = b+1

it behaves like I expect: it prints two instances handles at the beginning, and then afterward those are the two instance handles that show up time and time again. The key value in the dispose_instance_handle case is always "noot", as expected.

Moreover, if I start a second copy with the writing/disposing commented out, I also get that result. The instance handles are of course different from the first copy (different process → different key → unique instance handles), but the contents are as expected and the instance handles in the sample infos line up with the output of register_instance in the second copy.

Perhaps there is something in the OARIS types that cause it to misbehave. Simply fiddling with the type of S the values should then trigger it. Perhaps you can have a look at what the key type is for the topics you are seeing the problem, and see if that makes my little program reproduce the misbehaviour?

FirebarSim commented 1 month ago

The OARIS Sensor Track is keyed on subsystem id and sensor track id, here's the python class for it (aren't open standards nice?) image

In this case I am just generating some tracks from some ADS-B data, here are a few of the reports that I am getting: image Generated by this code: image

On disposal the following is reported image By this code image

I believe that I am doing this correctly, but I could of course be completely cocking it up. Running your code above on my VM works as I would expect. Altering the class to take the Subsystem ID and Sensor Track ID from OARIS also works fine. Doing the same with a pair of OARIS tracks (with only the sensor track ID different) returns the same instance handle.

eboasson commented 1 month ago

Found it! It is a confusion inside the python binding between samples and key values. In the DDS API spec register_instance and dispose take a sample, but the only fields that matter are the key fields.

The Cyclone Python binding however serializes it as a normal sample[^1], then treats it as-if it is a serialized key value. In my first attempt at reproducing it, the key is at the beginning of the serialized data and it all seems fine. In the OARIS data type, however, it is somewhere in the middle and at the end ...

If I change my program to have the key as a the second field, e.g.:

from dataclasses import dataclass
from cyclonedds.domain import DomainParticipant
from cyclonedds.pub import DataWriter
from cyclonedds.sub import DataReader, InvalidSample
from cyclonedds.topic import Topic
from cyclonedds.core import Qos, Policy
import cyclonedds.idl as idl
from cyclonedds.idl.annotations import key
import cyclonedds.idl.types as types

from time import sleep

@dataclass
class S(idl.IdlStruct, typename="S"):
    v: int
    k: int
    key("k")

dp = DomainParticipant(0)
tp = Topic(dp, "S", S, qos=Qos(Policy.History.KeepAll))
dw = DataWriter(dp, tp)
dr = DataReader(dp, tp)

ih0 = dw.register_instance(S(0, 1))
ih1 = dw.register_instance(S(0, 2))
print(f"ih0={ih0} ih1={ih1}")

b = 0
while True:
    if b % 2 == 0:
        dw.write(S(b, 1))
        dw.write(S(b, 2))
    else:
        dw.dispose(S(b, 1))
        dw.dispose_instance_handle(ih1)
    xs = dr.take(10)
    for x in xs:
        if isinstance(x, InvalidSample):
            print("Took invsample ", x.key_sample, x.sample_info.instance_handle)
        else:
            print("Took ", x, x.sample_info.instance_handle)
    sleep(1)
    b = b+1

it misbehaves in the same way as your program.

While I know the root cause and have a simple proof-of-concept fix — a.k.a. "a hack" — that makes register_instance work, I need some more time to figure out the proper fix.

[^1]: Serialization is by far the easiest way to transition between Python and C.

FirebarSim commented 1 month ago

Well that is great news that there is a definite solution!

FirebarSim commented 1 month ago

While I know the root cause and have a simple proof-of-concept fix — a.k.a. "a hack" — that makes register_instance work, I need some more time to figure out the proper fix.

Hi @eboasson, just wondering if you might be able to share the hacky fix (understand if you are not comfortable to!)? I've got a couple of systems that are really calling out for dispose to happen properly.

eboasson commented 1 month ago

@FirebarSim, sure, here you go:

diff --git a/clayer/pysertype.c b/clayer/pysertype.c
index 1fcefe6..e396ee4 100644
--- a/clayer/pysertype.c
+++ b/clayer/pysertype.c
@@ -302,10 +302,26 @@ static ddsi_serdata_t *serdata_from_keyhash (const struct ddsi_sertype *topic, c

 static ddsi_serdata_t *serdata_from_sample (const ddsi_sertype_t *type, enum ddsi_serdata_kind kind, const void *sample)
 {
+#if 0
   ddspy_sample_container_t *container = (ddspy_sample_container_t *)sample;
   ddspy_serdata_t *d = ddspy_serdata_new (type, kind, container->usample_size);
   memcpy ((char *)d->data, container->usample, container->usample_size);
   return serdata_from_common (d, kind);
+#else
+  ddspy_sample_container_t *container = (ddspy_sample_container_t *)sample;
+  ddspy_serdata_t *d = ddspy_serdata_new (type, SDK_DATA, container->usample_size);
+  memcpy ((char *)d->data, container->usample, container->usample_size);
+  if (serdata_from_common (d, kind) == NULL)
+    return NULL;
+  if (kind == SDK_KEY)
+  {
+    ddsrt_free (d->data);
+    d->data = d->key;
+    d->data_size = d->key_size;
+    d->data_is_key = true;
+  }
+  return &d->c_data;
+#endif
 }

 static void serdata_to_ser (const ddsi_serdata_t *dcmn, size_t off, size_t sz, void *buf)

It doesn't fix all problems, but maybe it just solves the problem for you. If not, I know it's been a while, but I had just started thinking about a proper fix ☺️

eboasson commented 1 month ago

@FirebarSim if you could give #273 a try, that'd be great.