IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.52k stars 4.81k forks source link

Application using camera crashes with `usbi_mutex_lock assertion` and closes the ssh connection #11573

Closed maciejandrzejewski-digica closed 1 year ago

maciejandrzejewski-digica commented 1 year ago

Required Info
Camera Model D455
Firmware Version 05.13.00.50
Operating System & Version Debian 11
Kernel Version (Linux Only) 5.10.66-27-rockchip-gea60d388902d
Platform Rock5B (Rockchip)
SDK Version v2.51.1
Language C++

Background description

C++ application is cross-compiled and run directly on the target hardware. Camera is connected to the target hardware. Two pipelines are running: RGB, Depth frames and IMU sensor read out with full speed. The IMU pipeline is run as a callback.

Issue description no. 1

Application crashes randomly within 1-2 minutes during the gdb debug session with following error:

../../libusb/os/threads_posix.h:46: usbi_mutex_lock: Assertion `pthread_mutex_lock(mutex) == 0' failed.
Signal: SIGABRT (Aborted)

Not sure why is this happening. Looks like a mutex inside libusb driver?

Issue description no. 2

The issues might be connected to the one above. The application is running without gdb session, started from the command line while connected through ssh. It crashes randomly but does not stay in the command line as it should but it closes the current ssh connection. Very strange behavior. I have observer same situation when running realsense-viewer. I have forced the error by keeping the camera opened in the application and running the realsense-viewer in the other connection. This was the output, please notice the the connection has been closed after the realsense-viewer exit with error:

mandrzejewski@rock5b:~$ realsense-viewer 
 10/03 18:09:09,337 WARNING [546199040400] (handle-libusb.h:108) failed to claim usb interface, interface 0, is busy - retrying...
 10/03 18:09:09,347 WARNING [546199040400] (handle-libusb.h:108) failed to claim usb interface, interface 0, is busy - retrying...
 10/03 18:09:09,357 ERROR [546199040400] (handle-libusb.h:123) failed to claim usb interface: 0, error: RS2_USB_STATUS_BUSY
 10/03 18:09:09,358 ERROR [546501013904] (sensor.cpp:576) acquire_power failed: failed to set power state
 10/03 18:09:09,360 WARNING [546501013904] (rs.cpp:310) null pointer passed for argument "device"
 10/03 18:09:09,360 WARNING [546501013904] (rs.cpp:2704) Couldn't refresh devices - failed to set power state
Connection to rock5b.local closed by remote host.
Connection to rock5b.local closed.
MartyG-RealSense commented 1 year ago

Hi @maciejandrzejewski-digica It is recommended that when the librealsense SDK is installed on Rockchip devices, the SDK is built from source code with CMake in RSUSB backend mode by using the build flag -DFORCE_RSUSB_BACKEND=true. An RSUSB build of the SDK bypasses the kernel and so is not dependent on Linux versions or kernel versions and does not need to be kernel patched.

maciejandrzejewski-digica commented 1 year ago

@MartyG-RealSense The SDK has been built with following command which includes this flag: cmake .. -DCMAKE_CROSSCOMPILING=TRUE -DFORCE_RSUSB_BACKEND=true -DCMAKE_BUILD_TYPE=release -DBUILD_EXAMPLES=true -DBUILD_GRAPHICAL_EXAMPLES=true -DCMAKE_TOOLCHAIN_FILE=../t oolchain_gcc_arm64.cmake -DBUILD_PYTHON_BINDINGS:bool=true

MartyG-RealSense commented 1 year ago

Could you check please whether there is a memory leak in your application. This is where the computer's memory is progressively consumed over time during the running of an application until it becomes unstable and may crash. You can do this by launching your application and then starting a Linux system monitoring tool such as htop.

The link below describes how to install htop on Debian and use it over an ssh connection.

https://www.cyberciti.biz/faq/how-to-install-htop-on-debian-linux-using-apt-get/

maciejandrzejewski-digica commented 1 year ago

I have run the app with the htop several times and the app did consumed the memory. It stayed on the same level in RES column. The app crashed with the same message as above. Here is the video presenting this: https://youtu.be/cz-De5_t7Qg

After that I have run the app with valgrind searching for memory leaks. Numerous links to libusb like this one below:

==7823== 248 bytes in 1 blocks are still reachable in loss record 22,653 of 28,718
==7823==    at 0x484A484: operator new(unsigned long) (vg_replace_malloc.c:342)
==7823==    by 0x83C2A47: allocate (new_allocator.h:114)
==7823==    by 0x83C2A47: allocate (alloc_traits.h:443)
==7823==    by 0x83C2A47: __allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<librealsense::platform::usb_device_libusb, std::allocator<librealsense::platform::usb_device_libusb>, (__gnu_cxx::_Lock_policy)2> > > (allocated_ptr.h:97)
==7823==    by 0x83C2A47: __shared_count<librealsense::platform::usb_device_libusb, std::allocator<librealsense::platform::usb_device_libusb>, libusb_device*&, libusb_device_descriptor&, const librealsense::platform::usb_device_info&, std::shared_ptr<librealsense::platform::usb_context>&> (shared_ptr_base.h:677)
==7823==    by 0x83C2A47: __shared_ptr<std::allocator<librealsense::platform::usb_device_libusb>, libusb_device*&, libusb_device_descriptor&, const librealsense::platform::usb_device_info&, std::shared_ptr<librealsense::platform::usb_context>&> (shared_ptr_base.h:1344)
==7823==    by 0x83C2A47: shared_ptr<std::allocator<librealsense::platform::usb_device_libusb>, libusb_device*&, libusb_device_descriptor&, const librealsense::platform::usb_device_info&, std::shared_ptr<librealsense::platform::usb_context>&> (shared_ptr.h:359)
==7823==    by 0x83C2A47: allocate_shared<librealsense::platform::usb_device_libusb, std::allocator<librealsense::platform::usb_device_libusb>, libusb_device*&, libusb_device_descriptor&, const librealsense::platform::usb_device_info&, std::shared_ptr<librealsense::platform::usb_context>&> (shared_ptr.h:702)
==7823==    by 0x83C2A47: make_shared<librealsense::platform::usb_device_libusb, libusb_device*&, libusb_device_descriptor&, const librealsense::platform::usb_device_info&, std::shared_ptr<librealsense::platform::usb_context>&> (shared_ptr.h:718)
==7823==    by 0x83C2A47: librealsense::platform::usb_enumerator::create_usb_device(librealsense::platform::usb_device_info const&) (enumerator-libusb.cpp:121)
==7823==    by 0x83C8ECF: librealsense::platform::create_rshid_device(librealsense::platform::hid_device_info) (hid-device.cpp:46)
==7823==    by 0x83E0027: librealsense::platform::rs_backend::create_hid_device(librealsense::platform::hid_device_info) const (rsusb-backend.cpp:67)
==7823==    by 0x812E007: librealsense::ds5_motion::create_hid_device(std::shared_ptr<librealsense::context>, std::vector<librealsense::platform::hid_device_info, std::allocator<librealsense::platform::hid_device_info> > const&, librealsense::firmware_version const&) (ds5-motion.cpp:201)
==7823==    by 0x8131587: librealsense::ds5_motion::ds5_motion(std::shared_ptr<librealsense::context>, librealsense::platform::backend_device_group const&) (ds5-motion.cpp:341)
==7823==    by 0x8184083: librealsense::rs455_device::rs455_device(std::shared_ptr<librealsense::context>, librealsense::platform::backend_device_group, bool) (ds5-factory.cpp:995)
==7823==    by 0x817338B: construct<librealsense::rs455_device, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (new_allocator.h:146)
==7823==    by 0x817338B: construct<librealsense::rs455_device, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (alloc_traits.h:483)
==7823==    by 0x817338B: _Sp_counted_ptr_inplace<std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr_base.h:548)
==7823==    by 0x817338B: __shared_count<librealsense::rs455_device, std::allocator<librealsense::rs455_device>, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr_base.h:679)
==7823==    by 0x817338B: __shared_ptr<std::allocator<librealsense::rs455_device>, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr_base.h:1344)
==7823==    by 0x817338B: shared_ptr<std::allocator<librealsense::rs455_device>, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr.h:359)
==7823==    by 0x817338B: allocate_shared<librealsense::rs455_device, std::allocator<librealsense::rs455_device>, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr.h:702)
==7823==    by 0x817338B: make_shared<librealsense::rs455_device, std::shared_ptr<librealsense::context>&, librealsense::platform::backend_device_group&, bool&> (shared_ptr.h:718)
==7823==    by 0x817338B: librealsense::ds5_info::create(std::shared_ptr<librealsense::context>, bool) const (ds5-factory.cpp:1075)
==7823==    by 0x8179277: librealsense::device_info::create_device() const (context.h:48)
==7823==    by 0x8395093: librealsense::pipeline::config::resolve(std::shared_ptr<librealsense::pipeline::pipeline>, std::chrono::duration<long, std::ratio<1l, 1000l> > const&) (config.cpp:202)
==7823==    by 0x83901E7: librealsense::pipeline::pipeline::unsafe_start(std::shared_ptr<librealsense::pipeline::config>) (pipeline.cpp:73)
==7823==    by 0x8390607: librealsense::pipeline::pipeline::start(std::shared_ptr<librealsense::pipeline::config>, std::shared_ptr<rs2_frame_callback>) (pipeline.cpp:39)

Moreover I can run the app without the camera using the recorded data while whole processing is going on. In that case the app does not crash. No sure how to search for error here.

MartyG-RealSense commented 1 year ago

Valgrind is not an ideal way to check for memory leaks in RealSense applications as it has been reported that it can provide false-positive results, as described at https://github.com/IntelRealSense/librealsense/issues/3433

MartyG-RealSense commented 1 year ago

Hi @maciejandrzejewski-digica Do you require further assistance with this case, please? Thanks!

maciejandrzejewski-digica commented 1 year ago

Yes, the problem with USB error still persists (Issue 1). The app fails with usb mutex assertion as stated in the issue description. I have no memory leaks except those which are pointed to librealsense by wide known tool valgrind.

Issue number 2 has been resolved. It was a problem with proper power supply that is described here as a "My new ROCK 5B can not boot / stuck in infinite boot loop": https://wiki.radxa.com/Rock5/FAQs


You have stated that valgrind have problem with librealsense. I have checked the issue #3433 and it is irrelevant to my valgrind errors. The one from the issue is: "Uninitialised value was created by a stack allocation" which is completely right because you were not initializing your buffer with known data. That is librealsense problem not valgrind.

MartyG-RealSense commented 1 year ago

As you are using a callback, did you place the word callback in the brackets of the pipeline start instruction to inform the program that it is a callback script? For example:

pipe.start(callback)

Or if cfg configuration instructions are used, put both callback and cfg in the brackets, separated by a comma but with no space between them:

pipe.start(callback,cfg)


It may also be worth trying to build librealsense in V4L2 backend mode instead of RSUSB backend mode to confirm whether or not libusb is causing the problem. This can be done by setting the flag -DFORCE_RSUSB_BACKEND=false instead of having it as true. https://github.com/IntelRealSense/librealsense/issues/10188#issuecomment-1025087787 is an example of a RealSense user who had greater stability with V4L2 backend than RSUSB.

maciejandrzejewski-digica commented 1 year ago

I'm using the callback and I have passed the function address of the callback into the start function.

I can confirm that same libusb error is present on both RPI4 and Rock5B. If I disable the IMU pipeline the error does not appear.

Below is the code:

int RealSenseVideoSource::open(const char *path)
{
    int ret{};
    rs2::config cfg;
    cfg.enable_stream(RS2_STREAM_DEPTH);
    cfg.enable_stream(RS2_STREAM_COLOR);
    if(m_sensInertialEnabled)
    {
        rs2::config cfgInertial;
        cfgInertial.enable_stream(RS2_STREAM_ACCEL, RS2_FORMAT_MOTION_XYZ32F, cRsAcclSampleRate); // Do not modify the FPS value - it is fixed for proper calculations.
        cfgInertial.enable_stream(RS2_STREAM_GYRO, RS2_FORMAT_MOTION_XYZ32F, cRsGyroSampleRate);  // Do not modify the FPS value - it is fixed for proper calculations.

        auto callback = [&](const rs2::frame& frame)
        {
            inertialSensorPipelineCallback(frame, this);
        };

        rs2::pipeline_profile ppAccel;
        try
        {
            ppAccel = m_inertial.start(cfgInertial, callback);
        }
        catch(const rs2::error &e)
        {
            processError(e);
        }
        if(! ppAccel)
        {
            msgError() << "Cannot open inertial sensors pipeline" << std::endl;
            ret |= -2;
        }

        if(ret != 0)
        {
            m_isOpened = false;
            return ret;
        }
    }

    rs2::pipeline_profile pp;
    try
    {
        pp = m_cam.start(cfg);
    }
    catch(const rs2::error &e)
    {
        processError(e);
        ret = -1;
    }
    if(ret != 0 || !pp)
    {
        m_isOpened = false;
        msgError() << "Cannot open video capture" << std::endl;
        ret |= -1;
        return ret;
    }

    if(msgIsDebug())
        m_rprint = std::make_unique<rs2::rates_printer>();

    m_isOpened = true;

    return ret;
}
MartyG-RealSense commented 1 year ago

The RealSense SDK has an example C++ program for callback called rs-callback. How it works is that it enables depth and color with the camera's default profile, which is what your own script above also does as you do not provide custom resolution and FPS settings in your cfg instructions.

rs-callback also automatically enables the IMU streams if an IMU is present.

https://github.com/IntelRealSense/librealsense/blob/master/examples/callback/rs-callback.cpp#L42-L44

maciejandrzejewski-digica commented 1 year ago

So you suggest to configure only single pipeline for both video and IMU?

MartyG-RealSense commented 1 year ago

When enabling depth, color and IMU simultaneously, using multiple pipelines is best suited to Python, whilst callbacks are recommended for C++ instead of multiple pipelines for depth-color-IMU streaming.

maciejandrzejewski-digica commented 1 year ago

Single pipeline for both video frames and IMU data in single callback solved the problem.

I would consider this an SDK bug as it allows to have multiple pipelines.

First I had a single video pipeline using wait_for_frames() as it was very handy in my implementation. Then I have add an IMU on second pipeline but faced some problems and created separate thread on that forum. I have moved the IMU pipeline to the callback. Now it appears all the pipelines should be used with single callback...

MartyG-RealSense commented 1 year ago

You can create multiple pipelines for multiple cameras and place a camera on each, like with the RealSense SDK rs-multicam example program that automatically creates a separate pipeline for each attached camera.

https://github.com/IntelRealSense/librealsense/tree/master/examples/multicam

There are virtually no C++ script references for using depth + color + IMU on the same camera with multiple pipelines though. It is done with callback instead.

maciejandrzejewski-digica commented 1 year ago

There is nowhere written you can not do it. If it is forbidden then disallow it during the init phase. I took so much time to debug this behavior...

MartyG-RealSense commented 1 year ago

It is not impossible as far as I know and I have in the past seen one C++ script reference for using IMU with multiple pipeline on the same camera. Because of the lack of scripting references, callback is the recommendation because there are quotable script links for that method.

I recall from that one case that the RealSense user who created a C++ two-pipeline script for depth-color-IMU said that it ran poorly compared to two pipelines on Python.