eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
891 stars 363 forks source link

Lifespan QoS setting ignored when using Cyclone DDS in ROS 2 #1835

Open okahilak opened 1 year ago

okahilak commented 1 year ago

I am encountering a behavior in ROS 2 where the Lifespan QoS setting seems to be ignored when using Cyclone DDS. This issue is not observed with eProsima Fast DDS.

Environment:

ROS 2 Version: Iron - Patch Release 2 OS: Ubuntu 22.04.3 LTS with PREEMPT_RT kernel patch Architecture: x86_64

I'm running the following ROS node, which implements a minimal subscriber with a lifespan set to 100 ms:

#include <functional>
#include <memory>

#include "rclcpp/rclcpp.hpp"
#include "std_msgs/msg/string.hpp"

using std::placeholders::_1;

const std::chrono::milliseconds DEADLINE = std::chrono::milliseconds(100);
const std::chrono::milliseconds LIFESPAN = std::chrono::milliseconds(100);

class MinimalSubscriber : public rclcpp::Node
{
public:
  MinimalSubscriber()
  : Node("minimal_subscriber")
  {
    auto qos = rclcpp::QoS(rclcpp::KeepLast(1))
      .reliability(RMW_QOS_POLICY_RELIABILITY_RELIABLE)
      .durability(RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL)
      .deadline(DEADLINE)
      .lifespan(LIFESPAN)
      .liveliness(RMW_QOS_POLICY_LIVELINESS_SYSTEM_DEFAULT);

    subscription_ = this->create_subscription<std_msgs::msg::String>(
      "/test", qos, std::bind(&MinimalSubscriber::topic_callback, this, _1));
  }

private:
  void topic_callback(const std_msgs::msg::String & msg) const
  {
    RCLCPP_INFO(this->get_logger(), "I heard: '%s'", msg.data.c_str());
  }
  rclcpp::Subscription<std_msgs::msg::String>::SharedPtr subscription_;
};

int main(int argc, char * argv[])
{
  rclcpp::init(argc, argv);
  rclcpp::spin(std::make_shared<MinimalSubscriber>());
  rclcpp::shutdown();
  return 0;
}

Expected Behavior:

When I query topic info, the lifespan should be 100 ms.

Observed Behavior:

The lifespan appears to be infinite.

Steps to reproduce:

Here's a step-by-step guide and attached is a .tar.gz file to help reproduce the issue:

tar -xvf cyclonedds_lifespan_problem.tar.gz

cd rostest/ros2_ws
colcon build
. install/local_setup.bash

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ros2 run cpp_pubsub listener

[In another terminal window:]
ros2 topic info --verbose /test

[For comparison:]
RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run cpp_pubsub listener
ros2 topic info --verbose /test

cyclonedds_lifespan_problem.tar.gz

eboasson commented 1 year ago

Hi @okahilak, what you're running into is that Cyclone's "lifespan" implementation follows the DDS specification. Section 2.2.3 of the spec clearly lists the "lifespan" QoS setting as applicable to DataWriter and Topic, but not to do DataReader, and defines it as:

2.2.3.16 LIFESPAN

The purpose of this QoS is to avoid delivering “stale” data to the application.

Each data sample written by the DataWriter has an associated ‘expiration time’ beyond which the data should not be delivered to any application. Once the sample expires, the data will be removed from the DataReader caches as well as from the transient and persistent information caches.

The ‘expiration time’ of each sample is computed by adding the duration specified by the LIFESPAN QoS to the source timestamp. As described in 2.2.2.4.2.11 and 2.2.2.4.2.12 the source timestamp is either automatically computed by the Service each time the DataWriter write operation is called, or else supplied by the application by means of the write_w_timestamp operation.

This QoS relies on the sender and receiving applications having their clocks sufficiently synchronized. If this is not the case and the Service can detect it, the DataReader is allowed to use the reception timestamp instead of the source timestamp in its computation of the ‘expiration time.’

I've always found it a strange choice. Surely it is the one interpreting the data that should decide what is relevant? (Imagine what "lifespan" would do to the archeologists ...) But the spec is pretty clear on what it means in DDS.

One could choose to also implement a notion of lifespan on the data reader. I have experience with OpenSplice DDS's version, I also know it has hardly been used. As reading is generally pretty cheap (because the data is already present at the reader by design) arguably discarding old data after reading it should usually be good enough, even if it is not exactly the same.

That said, it may be that ROS chose to interpret/document their lifespan QoS differently and everyone just assumed it would fit ...

Clearly I'm not firmly opposed to adding a notion of lifespan on the reader, it is more of a cost/benefit analysis. I simply don't think it is the most pressing issue with Cyclone that needs addressing. Despite that, I would certainly be happy to entertain a PR that adds it — it is fairly straightforward because the DDS-spec based implementation already requires removing expired samples from reader history cache, so it is more a matter of tweaking the timing than of adding truly new code.

okahilak commented 1 year ago

Hi! Thanks for the friendly and detailed response, I appreciate it!

Coming from the ROS world, I didn't venture into reading the DDS specification - you're right, it is pretty clear on how lifespan should work. I also agree that it would feel natural that the reader would have a say on how long the data stays relevant to them. That said, I can also see the importance of not deviating from the specification.

I think my confusion stemmed from how ROS2 documents lifespan and deadline without clarifying how they may apply differently to publisher and subscriber:

Deadline

Duration: the expected maximum amount of time between subsequent messages being published to a topic

Lifespan

Duration: the maximum amount of time between the publishing and the reception of a message without the message being considered stale or expired (expired messages are silently dropped and are effectively never received).

As per the documentation of eProsima Fast DDS, their lifespan QoS affects DataReader, DataWriter, and Topic entities, so it works differently from Cyclone DDS.

I could think of a few ways to decrease the potential for confusion, while not changing Cyclone DDS to deviate from DDS spec:

Alas, I don't have too much time right now to look into either option, but at least it is documented in this issue for possible future reference.