Replace the SRV mechanism in ROS2

liuzf1988 commented 1 year ago

This is a problem encountered in porting NDT algorithm from ROS2 ecosystem to DORA framework.

The SRV mechanism provided in ROS2 accepts the service request sent by the client and send back the response.

It is known the NDT algorithm requires GNSS signal or manual initial position assignment before it works, and the NDT algorithm is sensitive to the error of the given initial position, while the GNSS (without RTK) signal or manual initial position generally has an error of 2-10 meters, so the Particle filter + NDT algorithm is used to find an accurate position near the given initial position.

In the implementation of the autoware.universe's location module, the process of obtaining the exact initial pose is provided by a SRV mechanism, i.e. the serviceNDTAlign service in ndt_scan_matcher_core.cpp. The call to the serviceNDTAlign service is made in the pose_initializer package. Extract the relevant codes as follows:

// pose_initializer_core.cpp
void PoseInitializer::on_initialize(
  const Initialize::Service::Request::SharedPtr req,
  const Initialize::Service::Response::SharedPtr res)
{
    // ...
    try {
        change_state(State::Message::INITIALIZING);
        auto pose = req->pose.empty() ? get_gnss_pose() : req->pose.front();
        if (ndt_) {
            pose = ndt_->align_pose(pose);
        }
        pose.pose.covariance = output_pose_covariance_;
        // ...
  }
}

// ndt_module.cpp
NdtModule::NdtModule(rclcpp::Node * node) : logger_(node->get_logger())
{
    cli_align_ = node->create_client<RequestPoseAlignment>("ndt_align");
}

PoseWithCovarianceStamped NdtModule::align_pose(const PoseWithCovarianceStamped & pose)
{
    const auto req = std::make_shared<RequestPoseAlignment::Request>();
    req->pose_with_covariance = pose;

    if (!cli_align_->service_is_ready()) {
        throw component_interface_utils::ServiceUnready("NDT align server is not ready.");
    }

    RCLCPP_INFO(logger_, "Call NDT align server.");
    const auto res = cli_align_->async_send_request(req).get();
    if (!res->success) {
        RCLCPP_INFO(logger_, "NDT align server failed.");
        throw ServiceException(
            Initialize::Service::Response::ERROR_ESTIMATION, "NDT align server failed.");
    }
    RCLCPP_INFO(logger_, "NDT align server succeeded.");

    // Overwrite the covariance.
    return res->pose_with_covariance;
}

From the implementation of align_pose() function in ndt_module.cpp, it can be seen that, The SRV mechanism of ROS2 has two characteristics: first, whether the service is ready can be checked through if (! cli_align_ ->service_is_ready()) before calling the service; Second, service calls are executed asynchronously.

At present, there is no SRV mechanism in the DORA framework, so the Pub/Sub method is temporarily used to replace SRV: from dataflow.yml, it can be seen that first the pose_initializer operator sends a align_request_pose request message, and then the ndt_scan_matcher operator return the response message.

communication:
  zenoh:
    prefix: /localization-dataflow

nodes:
  - id: runtime-node
    operators:
      - id: pose_initializer
        shared-library: ../../../target/debug/pose_initializer
        inputs:
          manual_pose: manual_pose/initial_pose
          fitted_gnss_pose: runtime-node/map_height_fitter/fitted_gnss_pose
          aligned_response_pose: runtime-node/ndt_scan_matcher/aligned_response_pose
        outputs:
          - align_request_pose
          - initialpose3d

      - id: ndt_scan_matcher
        shared-library: ../../../target/debug/ndt_scan_matcher
        inputs:
          map_points: pointcloud_map_loader/map_points
          align_request_pose: runtime-node/pose_initializer/align_request_pose
          downsampled_pointcloud: runtime-node/pointcloud_downsample/downsampled_pointcloud
          regularization_pose: runtime-node/gnss_poser/gnss_pose_cov
          initial_pose: runtime-node/ekf_localizer/ekf_biased_pose_with_cov
          tf_baselink2lidar: tf_publisher/tf_baselink2lidar
        outputs:
          - ndt_pose_with_cov
          - aligned_response_pose

Such an approximate alternative cannot realize the two characteristics of SRV mechanism mentioned above: first, it is impossible to determine whether the service is ready before calling the service, because the ndt_align service provided by ndt_scan_matcher is actutally ready only when the operator has received a map_ points and downsampled_pointcloud input message. Since both ROS2 and Dora uses a Pub/Sub distributed communication mode, it is impossible to specify the node/operator startup sequence. So theoretically, it cannot be guaranteed that ndt_scan_matcher operator will receive the map_points and downsampled_pointcloud messages before receiving the align_request_pose request. Secondly, there is no asynchronous call mode, or I do not know how to use asynchronous mechanism in the implementation of DORA operator.

So there will be performance bottlenecks in the current alternatives. I wonder if there is a better alternative to SRV, Looking forward to your reply.

haixuanTao commented 1 year ago

Hi, Thanks for contributing to Dora by raising issues!

Making a service like ROS2 could be very interesting indeed!

From what I understand:

pose_initializer sends an align_request_pose to ndt_scan_matcher that sends back a aligned_response_pose to pose_initializer.
This is however dependent on if map_points and downsampled_pointcloud have first been received by ndt_scan_matcher.

you mention that:

theoretically, it cannot be guaranteed that ndt_scan_matcher operator will receive the map_points and downsampled_pointcloud messages before receiving the align_request_pose request

Would it be possible to just have an if-statement that responds False or a default zeroed aligned_response_pose if map_points and downsampled_pointcloud are not initialized?

And if we do actually implement a service, wouldn't we face the same initialization process issue?

phil-opp commented 1 year ago

Thanks a lot for the detailed explanation! Let me try to summarize the features that you would like to see in dora:

request/reply communication
- goal: receive the reply corresponding to a request without subscribing to all replies sent by server
load information
- give nodes information whether the receiver is able to process inputs, similar to service_is_ready
synchronized startup/startup sequence
- make sure that all nodes/operators are properly initialized before receiving messages/requests
asynchronous call mode
- I'm not sure if I understand what you mean. Could you give us more details on this?

Am I missing anything?

I wonder if there is a better alternative to SRV, Looking forward to your reply.

I don't think that think there is a better way right now. You could try to split the "client" to implement a more linear data flow instead of using a request/reply pattern. I imagine something like this:

nodes:
  - id: runtime-node
    operators:
      - id: pose_initializer
        ...
        inputs:
          manual_pose: manual_pose/initial_pose
        outputs:
          - align_request_pose

      - id: ndt_scan_matcher
        shared-library: ../../../target/debug/ndt_scan_matcher
        inputs:
          align_request_pose: runtime-node/pose_initializer/align_request_pose
        outputs:
          - aligned_response_pose

      - id: aligned_pose_handler
        ...
        inputs:
          aligned_response_pose: runtime-node/ndt_scan_matcher/aligned_response_pose

I'm not sure if this is possible in your case though.

dora-rs / dora

Replace the SRV mechanism in ROS2 #165