eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
888 stars 363 forks source link

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

Open matthew-ivester opened 2 weeks ago

matthew-ivester commented 2 weeks ago

While trying to clean up and improve some unit tests in code using Cyclone, I came across what looks like buggy (or at least undesirable) behavior around matching of local (ie, loopback) DataReaders and DataWriters.

The sequence of events is:

This seems like a race condition, as usually it works fine and both readers get the sample. Occasionally a unit test will fail because the last DataReader that was created doesn't receive anything. I would expect that on_publication_matched() would only be triggered once a write call will successfully deliver to the new DataReader. I've only seen this happen when we're creating two or more readers, so there may be some difference in behavior with how the first reader+writer are matched compared to subsequent ones.

For the moment we've worked around this by putting the waiting thread back to sleep for a short time (~5ms) after it sees the matched subscriber count tick up to the threshold value. That seems to fix it. But it would be better if this worked reliably.

I can reproduce this with our product's stack, but I don't currently have a straightforward test that can do so on top of cyclonedds alone. I can try to create one if needed, or see if we can get it to occur with more detailed logging enabled in Cyclone.

eboasson commented 1 week ago

That's intriguing (a.k.a. it is not immediately obvious to me why that race condition would exist), and I certainly consider it wrong not to deliver data to readers for which you have already received a "publication matched" notification.

The scenario is very simple, so perhaps I should simply try it myself first. I have plenty of test code I can copy-and-enhance 🙂

eboasson commented 3 days ago

No luck yet in reproducing this ...

You can find my attempt at https://github.com/eboasson/cyclonedds/commit/dee288c428ca0791d1420f2926f93db9a60c4f8a, perhaps you can see if it seems like it should exhibit the same behaviour?