eclipse-cyclonedds / cyclonedds-cxx

Other
83 stars 70 forks source link

Data is likely to be lost When use iox-roudi #365

Open YeahhhhLi opened 1 year ago

YeahhhhLi commented 1 year ago

When using iox-roudi to support shm communication, there is a high probability of loss of communication data.

dds_writer log:

E20230202 15:58:40.782938 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["ctrl"]
E20230202 15:58:40.783569 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["can"]
E20230202 15:58:40.783855 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["pad"]
E20230202 15:58:40.784041 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["pose"]
E20230202 15:58:40.784240 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["result"]
E20230202 15:58:40.784452 76803 dds_writer.cc:33] ===================[DDSWriter] write message, body["system"]

dds_reader log:

E20230202 15:58:40.784943 76811 dds_reader.cc:40] [DDSReader] data available message, body["system"]

According to the above log, we can see that the first four messages have not been received

The reason why it is said to be lost with a high probability is because we found that after each startup, the receipt of the subscription side is unstable. Sometimes all 6 messages can be received, and sometimes ctrl and system are received, and the middle 4 messages are lost.

dds_writer qos:

dds::pub::qos::DataWriterQos dw_qos;
  dw_qos.policy(dds::core::policy::Reliability(dds::core::policy::ReliabilityKind::RELIABLE,
                                               dds::core::Duration::from_secs(10)));
  dw_qos.policy(dds::core::policy::History(dds::core::policy::HistoryKind::KEEP_LAST, 16));
  dw_qos.policy(dds::core::policy::Deadline());
  dw_qos.policy(dds::core::policy::Durability(dds::core::policy::DurabilityKind::VOLATILE));
  dw_qos.policy(dds::core::policy::Liveliness(dds::core::policy::LivelinessKind::AUTOMATIC,
                                              dds::core::Duration::from_secs(3)));

dds_reader qos:

dds::sub::qos::DataReaderQos dr_qos;
    dr_qos.policy(dds::core::policy::Reliability(dds::core::policy::ReliabilityKind::RELIABLE,
                                                 dds::core::Duration::from_secs(10)));
    dr_qos.policy(dds::core::policy::History(dds::core::policy::HistoryKind::KEEP_LAST, 16));
    dr_qos.policy(dds::core::policy::Deadline());  // default inifinite
    dr_qos.policy(dds::core::policy::Durability(dds::core::policy::DurabilityKind::VOLATILE));
    dr_qos.policy(dds::core::policy::Liveliness(dds::core::policy::LivelinessKind::AUTOMATIC,
                                                dds::core::Duration::from_secs(3)));

iox-roudi config:

[general]
version = 1

[[segment]]

[[segment.mempool]]
size = 1088
count = 512

[[segment.mempool]]
size = 16448
count = 1024

[[segment.mempool]]
size = 32832
count = 1024

[[segment.mempool]]
size = 262208
count = 256

[[segment.mempool]]
size = 1048640
count = 256

[[segment.mempool]]
size = 4194368
count = 256

[[segment.mempool]]
size = 8388672
count = 128

[[segment.mempool]]
size = 33554496
count = 128

Is there any way for us to further troubleshoot the problem?

YeahhhhLi commented 1 year ago

hello, can someone help me with this problem?

eboasson commented 1 year ago

If you're always sending just 6 messages, then it sounds very much like the behaviour of KEEP_LAST 1 rather than that of KEEP_LAST 16.

Iceoryx has a configurable limit on the history depth, and with the Iceoryx guys having done the heavy lifting of figuring out how best to configure the Iceoryx endpoints for Cyclone, I always assumed it was set correctly. But looking at the code for dealing with "too large" subscriber history settings, I wonder.

Perhaps you can have a look at this: https://iceoryx.io/v2.0.1/advanced/configuration-guide/ and see how your Iceoryx installation is actually configured?

@MatthiasKillat am I looking in a reasonable direction here?