eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
845 stars 350 forks source link

Transient_Local is not work with Shared memory #1584

Closed bopeng-saitama closed 7 months ago

bopeng-saitama commented 1 year ago

Bug report

Required Info:

Operating System: Ubuntu 22.04.1 LTS

Compiler version: GCC 11.3.0

Eclipse iceoryx version: v2.0.3

Cyclonedds version: releases/0.9.x

ROS 2 version: humble

Observed result or behaviour: When Shared memory is enabled, if a message is sent before the subscriber starts, it cannot be received. This indicates that the transient_local is not work.

Expected result or behaviour: When the message sent before the subsciber start, it can be received.

Conditions where it occurred / Performed steps: reproduce repo

Steps to reproduce issue

#Terminal 1
source /opt/ros/humble/setup.bash
iox-roudi

#Terminal 2
cd ~/transient_local_iceoryx
source /opt/ros/humble/setup.bash
source install/setup.bash
CYCLONEDDS_URI=file://$PWD/cyclonedds.xml ros2 run transient_local_iceoryx transient_local_pub

#Terminal 3
cd ~/transient_local_iceoryx
source /opt/ros/humble/setup.bash
source install/setup.bash
CYCLONEDDS_URI=file://$PWD/cyclonedds.xml ros2 run transient_local_iceoryx transient_local_sub

Screenshot from 2023-02-28 12-27-22

It can be seen from the diagram that the subscriber did not receive the message.

Screenshot from 2023-02-28 12-28-12

If the subscriber is started first, messages can be received successfully.

eboasson commented 1 year ago

@bopeng-saitama I just read your https://github.com/eclipse-iceoryx/iceoryx/issues/1923 issue and the discussion, and then I think the problem can be solved by triggering the code that consumes data from Iceoryx and pushes it into the DDS reader history cache.

That'd involve calling receive_data_wakeup_handler once from shm_monitor_attach_reader. Easy enough, except that you can then get order-reversal if fresh data is coming in from Iceoryx simultaneously because then you'd have two threads racing each other. That can be solved in Cyclone, but it is a bit tricky. It would be much nicer if Iceoryx would allow one to manually trigger this callback once without actually publishing data.

@elfenpiff would such a trigger be possible in current Iceoryx?

elfenpiff commented 1 year ago

@eboasson

That can be solved in Cyclone, but it is a bit tricky. It would be much nicer if Iceoryx would allow one to manually trigger this callback once without actually publishing data. @elfenpiff would such a trigger be possible in current Iceoryx?

Yes it would be possible with the user trigger. So every listener would have an additional user trigger attachment which is always manually triggered after a new subscriber was attached. But it would only trigger the callback of the user trigger so we have to forward the call now to the callback of the newly attached subscriber and provide a pointer to it. But this should solve the problem.

Would this idea work for you? And is it clear or should I go a little bit more into the details?

seasearch commented 1 year ago

I try this solution to solve this situation: The data handler "receive_data_wakeup_handler" is called immediately after this handler is attached to listener during the data reader is initialized in function "dds_create_reader_int". Certainly some constraints must be given to this operation, my code looks like:

`rc = shm_monitor_attach_reader(&rd->m_entity.m_domain->m_shm_monitor, rd);

if (rc != DDS_RETCODE_OK) {
  // we fail if we cannot attach to the listener (as we would get no data)
  iox_sub_deinit(rd->m_iox_sub);
  rd->m_iox_sub = NULL;
  DDS_CLOG(DDS_LC_WARNING | DDS_LC_SHM,
           &rd->m_entity.m_domain->gv.logconfig,
           "Failed to attach iox subscriber to iox listener\n");
  // FIXME: We need to clean up everything created up to now.
  //        Currently there is only partial cleanup, we need to extend it.
  goto err_bad_qos;
}

if ((subscriber_entity) && (rqos->durability.kind == DDS_DURABILITY_TRANSIENT_LOCAL))
{
  // access the previous data directly
  receive_data_wakeup_handler(rd);
}`

But I worry about some extreme situation such as frequently publishing and disorder data received in Receiver. Any suggestion or comments? @elfenpiff @eboasson @bopeng-saitama

duyu commented 2 months ago

@bopeng-saitama hi, is this fixed already?

i still see this issue somehow, https://github.com/ros2/rmw_cyclonedds/issues/401#issuecomment-2210238198

thanks,

bopeng-saitama commented 1 month ago

@duyu Unfortunately, this issue hadn't been solved at that time. I'm not sure if it has been resolved now.

duyu commented 1 month ago

@bopeng-saitama thank you for replying, it seems the issue is still there in the latest humble.

so you fixed it in your own cyclone-dds build with the change you mentioned above? any issue you saw regarding the extreme situations you menioned above?

bopeng-saitama commented 1 month ago

@duyu I remember that simple talker listener applications could indeed avoid this issue with the above modification. However, more complex ROS2 applications like Autoware could not run properly. I'm sorry, but I haven't been following this issue for a long time, so I don't remember the details very clearly.

duyu commented 1 month ago

@bopeng-saitama thank you! Appreciate your input, I'll try it out later as well.