BrettRD / ros-gst-bridge

a bidirectional ros to gstreamer bridge and utilities for dynamic pipelines
Other
128 stars 30 forks source link

Pipeline crashes on close #59

Open samuparta opened 7 months ago

samuparta commented 7 months ago

Hello,

I get this error every time I stop a pipeline containing the rosimagesrc element (e.g., gst-launch-1.0 rosimagesrc ros-topic=/test_camera ! videoconvert ! queue ! xvimagesink):

GStreamer-CRITICAL **: 11:33:40.165: gst_buffer_map_range: assertion 'GST_IS_BUFFER (buffer)' failed Caught SIGSEGV Spinning. Please run 'gdb gst-launch-1.0 464949' to continue debugging, Ctrl-C to quit, or Ctrl-\ to dump core.

Then I get a sequence of warnings from the ros2 node:

[WARN] [1702290270.622727331] [gst_image_src_node]: dropping message

Until I press CTRL-C a second time.

I wonder if there's a fix for this, since it makes it difficult to integrate your bridge in other applications that require to start and stop pipelines multiple times.

Thank you very much

BrettRD commented 7 months ago

Does it give you a chance to load gdb on the segfault? I've seen a bunch of instability on exit, and I'm not sure exactly where it's coming from.

samuparta commented 7 months ago

Hello @BrettRD,

Thank you very much for your reply.

I ran gdb on a pipeline containing the rosimagesrc element. In the gdb console I executed the run command, then interrupted with CTRL-C and finally used signal SIGINT to simulate application behaviour to closing, in fact I got the usual GStreamer-CRITICAL: 08:40:43.868: gst_buffer_map_range: assertion 'GST_IS_BUFFER (buffer)' failed error.

I attach the terminal log:

gdb --args gst-launch-1.0 rosimagesrc ros-topic=/gazebo/camera/front ! queue ! videoconvert ! xvimagesink

GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from gst-launch-1.0... Reading symbols from /usr/lib/debug/.build-id/51/d8cc94d6fa63f1e2d0162e390908b62e0b0c5d.debug...

(gdb) run

Starting program: /usr/bin/gst-launch-1.0 rosimagesrc ros-topic=/gazebo/camera/front ! queue ! videoconvert ! xvimagesink [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Setting pipeline to PAUSED ... [New Thread 0xfffff66851e0 (LWP 335537)] [New Thread 0xfffff523c1e0 (LWP 335538)] [New Thread 0xfffff49b21e0 (LWP 335539)] [New Thread 0xffffeffff1e0 (LWP 335540)] [New Thread 0xffffef7fe1e0 (LWP 335541)] [New Thread 0xffffeeffd1e0 (LWP 335542)] [New Thread 0xffffee7fc1e0 (LWP 335543)] [New Thread 0xffffedffb1e0 (LWP 335544)] [New Thread 0xffffed7fa1e0 (LWP 335545)] [New Thread 0xffffecff91e0 (LWP 335546)] [New Thread 0xffffd7fff1e0 (LWP 335547)] [New Thread 0xffffd77fe1e0 (LWP 335548)] [New Thread 0xffffd6ffd1e0 (LWP 335549)] Pipeline is live and does not need PREROLL ... [New Thread 0xffffd67fc1e0 (LWP 335550)] [INFO] [1702453184.362573689] [gst_image_src_node]: getcaps with filter 'NULL' [INFO] [1702453184.362746351] [gst_image_src_node]: waiting for first message Setting pipeline to PLAYING ... [INFO] [1702453184.363027477] [gst_image_src_node]: stream_start at 1702453184363023990 New clock: GstSystemClock [INFO] [1702453186.443240158] [gst_image_src_node]: preparing video with caps 'video/x-raw, format=(string)RGB, height=(int)1080, width=(int)1920, framerate=(fraction)0/1' [New Thread 0xffffd483e1e0 (LWP 335621)] [Thread 0xffffd483e1e0 (LWP 335621) exited]

^C

Thread 1 "gst-launch-1.0" received signal SIGINT, Interrupt. 0x0000fffff7b9b098 in __GI___poll (fds=0xaaaaaafa4b30, nfds=2, timeout=) at ../sysdeps/unix/sysv/linux/poll.c:41 41 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.

(gdb) signal SIGINT

Continuing with signal SIGINT. handling interrupt. Interrupt: Stopping pipeline ... Execution ended after 0:00:59.445939984 Setting pipeline to NULL ...

(gst-launch-1.0:335534): GStreamer-CRITICAL **: 08:40:43.868: gst_buffer_map_range: assertion 'GST_IS_BUFFER (buffer)' failed

Thread 14 "rosimagesrc0:sr" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xffffd6ffd1e0 (LWP 335549)] __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:195 195 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.

So it is related to some memcpy but I can not locate it precisely in the code. I just found this conversation [https://github.com/BrettRD/ros-gst-bridge/pull/14]() where you were saying that it could be related to the memcpy in ros_image_create, did you have any update on that?

Thank you again!

samuparta commented 7 months ago

CONTINUATION

By running (gdb) backtrace, after the aforementioned commands, I get the following:

0 __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:195

1 0x0000fffff736bf58 in memcpy (len=6220800, src=, __dest=)

at /usr/include/aarch64-linux-gnu/bits/string_fortified.h:34

2 rosimagesrc_create(GstBaseSrc*, guint64, guint, GstBuffer**)

(base_src=, offset=18446744073709551615, size=, buf=0xffffd6ffc728) at /home/samuparta/ros2_ws/src/ros-gst-bridge/gst_bridge/src/rosimagesrc.cpp:529

3 0x0000fffff6e927b0 in gst_base_src_get_range

(src=src@entry=0xaaaaaaf0d4e0 [GstBaseSrc|rosimagesrc0], offset=offset@entry=18446744073709551615, length=, length@entry=6220800, buf=buf@entry=0xffffd6ffc830) at gstbasesrc.c:2527

4 0x0000fffff6e958f0 in gst_base_src_loop (pad=0xaaaaaaf4c190 [GstPad|src]) at gstbasesrc.c:2851

5 0x0000fffff7eddb04 in gst_task_func (task=0xaaaaab105170 [GstTask|rosimagesrc0:src]) at gsttask.c:328

6 0x0000fffff7cf2e20 in () at /lib/aarch64-linux-gnu/libglib-2.0.so.0

7 0x0000fffff7cf2484 in () at /lib/aarch64-linux-gnu/libglib-2.0.so.0

8 0x0000fffff7c4d624 in start_thread (arg=0xfffff7d168f0) at pthread_create.c:477

9 0x0000fffff7ba462c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

In particular from this line:

2 rosimagesrc_create(GstBaseSrc*, guint64, guint, GstBuffer**)

(base_src=, offset=18446744073709551615, size=, buf=0xffffd6ffc728) at /home/samuparta/ros2_ws/src/ros-gst-bridge/gst_bridge/src/rosimagesrc.cpp:529

I can actually trace the error back to the memcpy in rosimagesrc_create function.

BrettRD commented 7 months ago

That's perfect! thank you so much!

yes, this says the memcpy causes the actual crash, and that singles out the buffer mistake the assert warning called out

The complete fix is a couple of parts, not trivial, not difficult

It'll be a few weeks before I can get back into hacking on this If you'd like to dig into the code, have a look at the develop branch, it's close to a major release

samuparta commented 7 months ago

That's very nice to hear, I look forward to that. In the meanwhile I'll try looking into your fix proposal even though I don't have much experience with gst libraries.

Keep me posted!

Thank you again