Closed TieneSabor closed 3 years ago
Could you provide code and instructions for replicating this?
Sorry I spend some time to make a simplified code to replicate the situation, here is the code:
#include <rcl/rcl.h>
#include <rcl/error_handling.h>
#include <rclc/rclc.h>
#include <rclc/executor.h>
#include <std_msgs/msg/string.h>
#include <stdio.h>
#include <unistd.h>
#define ARRAY_LEN 1900
#define RCCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Aborting.\n",__LINE__,(int)temp_rc); return 1;}}
#define RCSOFTCHECK(fn) { rcl_ret_t temp_rc = fn; if((temp_rc != RCL_RET_OK)){printf("Failed status on line %d: %d. Continuing.\n",__LINE__,(int)temp_rc);}}
rcl_publisher_t publisher;
rcl_subscription_t sub;
std_msgs__msg__String pub_msg;
std_msgs__msg__String sub_msg;
int counter = 0;
void timer_callback(rcl_timer_t * timer, int64_t last_call_time)
{
(void) last_call_time;
if (timer != NULL) {
counter++;
pub_msg.data.size = 1900;
RCSOFTCHECK(rcl_publish(&publisher, &pub_msg, NULL));
}
}
void subscription_callback(const void * sub_msgin)
{
const std_msgs__msg__String * sub_msg = (const std_msgs__msg__String *)sub_msgin;
printf("received %dth data \r\n", counter);
}
int main(int argc, const char * const * argv)
{
rcl_allocator_t allocator = rcl_get_default_allocator();
rclc_support_t support;
// create init_options
RCCHECK(rclc_support_init(&support, argc, argv, &allocator));
// create node
rcl_node_t node = rcl_get_zero_initialized_node();
RCCHECK(rclc_node_init_default(&node, "string_node", "", &support));
// create publisher
RCCHECK(rclc_publisher_init_default(
&publisher,
&node,
ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, String),
"/string_publisher"));
// create subscriber
RCCHECK(rclc_subscription_init_default(&sub, &node,
ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, String), "/string_publisher"));
// create timer,
rcl_timer_t timer = rcl_get_zero_initialized_timer();
const unsigned int timer_timeout = 1;
RCCHECK(rclc_timer_init_default(
&timer,
&support,
RCL_MS_TO_NS(timer_timeout),
timer_callback));
// create executor
rclc_executor_t executor = rclc_executor_get_zero_initialized_executor();
RCCHECK(rclc_executor_init(&executor, &support.context, 2, &allocator));
RCCHECK(rclc_executor_add_timer(&executor, &timer));
RCCHECK(rclc_executor_add_subscription(&executor, &sub, &sub_msg,
&subscription_callback, ON_NEW_DATA));
// Fill the array with a known sequence
pub_msg.data.data = (char * ) malloc(ARRAY_LEN * sizeof(char));
for(int i=0;i<1899;i++){
pub_msg.data.data[i] = 'A';
}
pub_msg.data.data[1899] = '\0';
pub_msg.data.size = 1900;
pub_msg.data.capacity = ARRAY_LEN;
sub_msg.data.data = (char * ) malloc(ARRAY_LEN * sizeof(char));
sub_msg.data.size = 1900;
sub_msg.data.capacity = ARRAY_LEN;
rclc_executor_spin(&executor);
RCCHECK(rcl_subscription_fini(&sub, &node))
RCCHECK(rcl_publisher_fini(&publisher, &node))
RCCHECK(rcl_node_fini(&node))
}
I'm going to test this, I'll be back in a while
I have it replicated and I have found a bug in the XRCE-DDS Client. Working on the solution I'll be back in a while.
I have prepared a patch here so this segfault should not appear anymore. But it is still pending to determine the XRCE-DDS Client fragmentation corner case when the sub-message sequence number rollover.
Closing since solved.
Please reopen if the problem still happens when the PR is merged.
Update, also we have found a problem with the middleware history, solved here: https://github.com/micro-ROS/micro_ros_setup/pull/292
Please notify when you can retry your code and everything works.
Hello, Thanks for the help and I have checked out those changes. I will retry codes ASAP. However, I am wondering why this situation only happens when stream->base.history is not power of two?
I had done the test with the new patch, the segmentation fault have disappeared. However, I tried to change the XRCE stream history size to 32 without add lines in input_reliable_stream.c, and the segmentation fault still happened.
As far as I know, the library is implemented in order to handle the reliability of the streams based on these assumptions. Because each XRCE message has a sequence number from 0 to 2^16-1, so having a power of two buffer historic size allow us to assign sequence numbers to historic slots in an optimal way using a modulo operation.
This segmentation fault is happening only when XRCE is fragmenting and the sequence number rolls over. So:
input_reliable_stream.c
allows to detect when the fragmenting buffer callback fails, preventing the seg fault.Another solution would be to increase MTU in order to avoid XRCE fragmenting packages.
Thanks for the explanation!
I was recently trying to test micro ros client and agent using an app similar to rtt-test demo code, but written in rclc instead. (https://github.com/micro-ROS/micro-ROS-rtt) I found that after publish and subscribe certain amount of data, the client would failed and showed segmentation fault. My test case was that the client continuously publish a roughly 2k Byte data and subscribe the same topic in each round, and it failed at about 16k th round.
Core was generated by `install/micro_ros_demos_rclc/lib/micro_ros_demos_rclc/rtt_test_host/rtt_test_ho'.
here's a print screen for easy reading
Please tell me what else I need to provide to solve this, thanks!