Open Pro opened 1 year ago
Hey, @Pro i have the same issue with 0.10.3, see here #392 (just closed it because i thought it was my stupidity)
Currently i use 0.10.2 with the fixes patched into it, i did not get i work with 0.10.3 too.
@eboasson do you have a idea on this?
@trittsv THANK YOU :)
I can confirm that directly applying the fixes onto 0.10.2 works properly, but using release 0.10.3 does not work.
I also tested using CycloneDDS (C basis) with 0.10.2 and 0.10.3. That does not make a difference. But as soon as I use the CXX wrapper in version 0.10.3, it does not work. 0.10.2 with the patches works.
The problematic commit is this one: https://github.com/eclipse-cyclonedds/cyclonedds-cxx/commit/a49b3f0f5ad118edad7628fb5e4e3d7ece8467b5
If I remove it from 0.10.3, there is no issue with on_publication_matched.
It was part of #387 (ping @eboasson @reicheratwork )
Thank you so much @Pro and @trittsv!
Just wanted to let you know that this is pretty high up on our list of issues. I am happy that at least there is a workaround (reverting a49b3f0f5ad118edad7628fb5e4e3d7ece8467b5, but by itself that can't be a fix because that change went in for another good reason).
Hi @Pro and @trittsv , @eboasson has asked me to look into this issue and I wanted to give you a little update of where we are at right now. We were able to reproduce the issue successfully and quickly found the culprit.
The first publication_matched event is fired by the ddsi thread that is responsible for the initial discovery between Reader and Writer, and it occurs some time after the C++ Writer creation finished successfully (We can actually see that the sleep has been invoked a number of times prior to the C++ Listener callback.) However, since the Participant stays alive for the duration of the publishing application, it will remember the presence of the remote Reader and notify the 2nd Writer about it directly at creation time on account of the thread that is actually invoking the dds_create_writer call. Because the C++ API is built on top of the C API, and the listener object is passed down to C at creation time, the C Writer will try to invoke its C++ listener during the the invocation dds_create_writer, and so before C++ had any chance of wrapping a C++ object around this Writer.
The commit that broke your example is needed to prevent a C++ writer from dropping its last reference during a listener callback. For that reason it creates an additional reference to the Writer, which is dropped after the listener callback ended successfully. However, in this particular case the call to create an additional reference to the Writer fails due to the fact that C++ hasn't been able to create its C++ wrapper around that Writer in the first place, causing the callback to be skipped.
What we will do to fix this is to create the C++ writer in a two-step process:
This approach would require us to change the behavior of dds_set_listener somewhat, since it currently doesn't check for any pending events and doesn't invoke any listener calls. But our expectation is that this approach would fix your problem.
I am still facing the issue described in #378 with version 0.10.3 (ping @trittsv)
I.e., whichever process starts second, it does not properly call the corresponding
on_subscription_matched
oron_publication_matched
callbacks in the Listeners.Note that the same code works perfectly fine with RTI Connext DDS.
I used the same code as in #378 for reproduction steps: https://github.com/eclipse-cyclonedds/cyclonedds-cxx/files/10918029/listener-not-reliable.zip
With a small modification, to re-create the publisher and writer in a loop (see code below) and keep the subscriber running forever.
The interesting part now is, that the on_publication_matched callback is only called once (in the best case). For subsequent creations of the publisher & writer, the on_publication_matched is not called at all.
I.e., if you start the subscriber.cpp, and then the publisher.cpp, you will see the following output:
And the writer definitely has a match, otherwise it would not write the data (see wait loop).
So, why is the on_publication_matched only called once, and not within every loop? If I change the code to also re-create the Domain Participant inside the loop, the on_publication_matched callback is always called. So it looks like there is some caching of statuses.
publisher.cpp