rs2::pipeline::stop() takes several seconds longer if more than 5 frames have been polled from the camera

IntelRealSense / librealsense

Intel® RealSense™ SDK

https://www.intelrealsense.com/

Apache License 2.0

7.43k stars 4.8k forks source link

rs2::pipeline::stop() takes several seconds longer if more than 5 frames have been polled from the camera #13006

Open mattiasbax opened 3 weeks ago

mattiasbax commented 3 weeks ago

Required Info
Camera Model	D415
Firmware Version	5.16.0.1
Operating System & Version	Linux Debian 12
Kernel Version (Linux Only)	6.6.12
Platform	PC/Raspberry Pi/ NVIDIA Jetson / etc..
SDK Version	2.54.2 }
Language	C++
Segment	Others

Issue Description

Hello,

I have a setup with 4x D415 cameras. I've noticed significant differences in execution time of rs2::pipeline::stop() and did some debugging to try find out what causes this. I have narrowed it down to that it is dependent on how many times I call rs2::pipeline::poll_for_frames() after starting the pipeline via rs2::pipeline::start(). For example, anything from 5 or less polled frames will cause stopping all 4 cameras <275ms, but more than 5 frames will cause stopping them take up to several seconds.

I created a program that reproduces this behavior consistently (see below). The image snippet show the different timing results for 10 pipeline stops for polling 5 and 6 frames, respectively.

Any idea what might be causing this behavior and how I could mitigate it?

Thanks! / M

#include <chrono>
#include <future>
#include <iostream>
#include <set>
#include <string>
#include <thread>
#include <tuple>

#include <librealsense2/rs.hpp>
#include <librealsense2/rs_advanced_mode.hpp>

struct camera_struct {
    std::string serial;
    rs2::config config;
    rs2::device device;
    rs2::pipeline pipeline;
    rs2::pipeline_profile profile;
};
using Cameras = std::vector<camera_struct>;

int main(int argc, char** argv)
{
    std::ignore = argc;

    rs2::context realsense_context = rs2::context();
    std::cout << "Realsense version: " << RS2_API_VERSION_STR << std::endl;

    auto realsense_devices = realsense_context.query_devices();
    auto cameras = std::vector<camera_struct>(realsense_devices.size());
    std::vector<std::thread> threads;

    auto setupCamera = [&](size_t idx) {
        auto& camera = cameras[idx];
        camera.device = realsense_devices[idx];
        camera.serial = camera.device.get_info(RS2_CAMERA_INFO_SERIAL_NUMBER);
        std::cout << "Setting up camera " << camera.serial << std::endl;

        rs400::advanced_mode advanced_mode{camera.device};
        advanced_mode.load_json(
            "{\"param-disparityshift\":\"100\",\"param-depthunits\":\"100\"}");

        camera.config.enable_device(camera.serial);
        camera.config.enable_stream(
            RS2_STREAM_DEPTH, 1280, 720, RS2_FORMAT_Z16, 15);
        camera.config.enable_stream(
            RS2_STREAM_INFRARED, 1, 1280, 720, RS2_FORMAT_Y8, 15);
        camera.config.enable_stream(
            RS2_STREAM_INFRARED, 2, 1280, 720, RS2_FORMAT_Y8, 15);

        auto depth_sensor = camera.device.first<rs2::depth_sensor>();
        depth_sensor.set_option(RS2_OPTION_ENABLE_AUTO_EXPOSURE, 1);
        depth_sensor.set_option(RS2_OPTION_LASER_POWER, 150);
        depth_sensor.set_option(RS2_OPTION_EMITTER_ENABLED, 1);
    };

    for (size_t i = 0; i < realsense_devices.size(); ++i) {
        threads.push_back(std::thread(setupCamera, i));
    }

    for (auto& thread : threads) {
        thread.join();
    }

    const size_t num_frames = std::stoi(std::string(argv[1]));
    std::cout << "Polling " << num_frames << " frames..." << std::endl;
    constexpr int num_iterations = 10;
    for (int i = 0; i < num_iterations; ++i) {
        for (auto& camera : cameras) {
            camera.profile = camera.pipeline.start(camera.config);
        }

        rs2::frameset frames;
        for (auto& camera : cameras) {
            size_t number_of_captured_frames = 0;
            while (number_of_captured_frames < num_frames) {
                if (camera.pipeline.poll_for_frames(&frames)) {
                    ++number_of_captured_frames;
                }
            }
        }

        auto start = std::chrono::high_resolution_clock::now();
        for (auto& camera : cameras) {
            camera.pipeline.stop();
        }
        auto end = std::chrono::high_resolution_clock::now();

        auto time_elapsed =
            std::chrono::duration_cast<std::chrono::milliseconds>(end - start)
                .count();

        std::cout << "Stopping cameras took " << time_elapsed << "ms."
                  << std::endl;
    }

    return 0;
}

MartyG-RealSense commented 3 weeks ago

Hi @mattiasbax May I first ask about the specification of the computer / computing device that you are using, please? As the number of simultaneously active RealSense cameras attached to a computer is increased, the amount of the computer's resources that are consumed increases. It is therefore recommendable for a computer with 4 simultaneously active cameras attached to have an Intel Core i7 CPU or equivalent.

May I also ask which method you used to install the librealsense SDK. If you built it from source code and used the RSUSB = true method then RSUSB is best suited to single camera applications, whilst a kernel-patched build of the SDK works best for multiple camera applications.

The recommended camera firmware driver to use with SDK 2.54.2 is 5.15.1.0, as the 5.16.0.1 firmware is designed for the current latest librealsense SDK version 2.55.1.

Kernel 6.6 is not currently supported by the SDK. The most recent supported version at the time of writing this is 6.5, and that support was introduced in the 2.55.1 SDK. If the SDK was installed with the RSUSB = true method though then the kernel version should not matter as an RSUSB build of the SDK bypasses the kernel.

mattiasbax commented 2 weeks ago

Hi @mattiasbax May I first ask about the specification of the computer / computing device that you are using, please? As the number of simultaneously active RealSense cameras attached to a computer is increased, the amount of the computer's resources that are consumed increases. It is therefore recommendable for a computer with 4 simultaneously active cameras attached to have an Intel Core i7 CPU or equivalent.

May I also ask which method you used to install the librealsense SDK. If you built it from source code and used the RSUSB = true method then RSUSB is best suited to single camera applications, whilst a kernel-patched build of the SDK works best for multiple camera applications.

The recommended camera firmware driver to use with SDK 2.54.2 is 5.15.1.0, as the 5.16.0.1 firmware is designed for the current latest librealsense SDK version 2.55.1.

Kernel 6.6 is not currently supported by the SDK. The most recent supported version at the time of writing this is 6.5, and that support was introduced in the 2.55.1 SDK. If the SDK was installed with the RSUSB = true method though then the kernel version should not matter as an RSUSB build of the SDK bypasses the kernel.

Hi Marty!

I am currently running on an Intel NUC 11 but I don't believe the hardware throughput is the issue why stopping the pipeline on the cameras would take inconsistent time to execute.
I am using a kernel patched SDK which is built from source without using the RSUSB flag.
I have tried several versions of librealsense and camera firmware with the same behavior.
Testing this on ubuntu 18 had the same effect

Just to clarify: Everything works as intended in terms of running multiple cameras at desired framerates and polling frames for each camera respectively. The only thing behaving inconsistently is the time it takes to close the camera pipelines.

MartyG-RealSense commented 2 weeks ago

Are you automatically creating a separate pipeline for each attached camera like in the SDK's rs-multicam C++ multiple camera example program?

https://github.com/IntelRealSense/librealsense/blob/master/examples/multicam/rs-multicam.cpp#L26-L27

mattiasbax commented 2 weeks ago

Are you automatically creating a separate pipeline for each attached camera like in the SDK's rs-multicam C++ multiple camera example program?

https://github.com/IntelRealSense/librealsense/blob/master/examples/multicam/rs-multicam.cpp#L26-L27

Yes, that happens in the above example where we have a vector of structs (where each struct has it's own rs2::pipeline object)

MartyG-RealSense commented 2 weeks ago

As poll_for_frames() is being used, there can be situations where you need to put the CPU to sleep for a time and specify for how long it should sleep, otherwise the CPU's percentage usage can max out at 100% of a single CPU core.

https://github.com/IntelRealSense/librealsense/issues/2219#issuecomment-412350887 has a multicam C++ example of code for putting the CPU to sleep when using poll_for_frames().

this_thread::sleep_for(milliseconds(1)); // Otherwise we get 100% CPU

You coud verify whether the CPU is maxing out at 100% using an Ubuntu system monitoring tool such as htop.

mattiasbax commented 2 weeks ago

As poll_for_frames() is being used, there can be situations where you need to put the CPU to sleep for a time and specify for how long it should sleep, otherwise the CPU's percentage usage can max out at 100% of a single CPU core.

#2219 (comment) has a multicam C++ example of code for putting the CPU to sleep when using poll_for_frames().

this_thread::sleep_for(milliseconds(1)); // Otherwise we get 100% CPU

You coud verify whether the CPU is maxing out at 100% using an Ubuntu system monitoring tool such as htop.

This has no affect, as the problem is not with poll_for_frames (or wait_for_frames for that matter) but that the time it takes to stop the camera pipeline is highly inconsistent

mattiasbax commented 2 weeks ago

What is even more interesting is that if the frames from the camera are polled in threads has an affect on whether or not the rs2::pipeline::stop will have several seconds of inconsistency or not.

In example, if the above code are replaced with:

    constexpr int num_iterations = 10;

    const size_t num_frames = std::stoi(std::string(argv[1]));

    auto captureCamera = [&](size_t idx) {
        auto& camera = cameras[idx];
        camera.profile = camera.pipeline.start(camera.config);
        rs2::frameset frames;
        size_t number_of_captured_frames = 0;
        while (number_of_captured_frames < num_frames) {
            if (camera.pipeline.poll_for_frames(&frames)) {
                ++number_of_captured_frames;
            }
        }
    };

    for (int iteration = 0; iteration < num_iterations; ++iteration) {
        std::vector<std::thread> capture_threads;
        for (size_t i = 0; i < cameras.size(); ++i) {
            capture_threads.push_back(std::thread(captureCamera, i));
        }
        for (auto& capture_thread : capture_threads) {
            capture_thread.join();
        }

        auto start = std::chrono::high_resolution_clock::now();
        for (auto& camera : cameras) {
            camera.pipeline.stop();
        }
        auto end = std::chrono::high_resolution_clock::now();

        auto time_elapsed =
            std::chrono::duration_cast<std::chrono::milliseconds>(end - start)
                .count();

        std::cout << "Stopping cameras took " << time_elapsed << "ms."
                  << std::endl;
    }

I will have the following (consistent) result of stopping the camera pipeline:

Note that the part of stopping the pipelines have not changed, only how the frames were polled.

MartyG-RealSense commented 2 weeks ago

I would usually not recommend using threads as they can make programs more complicated and increase the risk of instability compared to a non-threaded version of the code. However, it does appear to be an appropriate solution in your particular project.

mattiasbax commented 2 weeks ago

Using threads is not an option in this case as that would mean that all cameras would poll frames simultaneously which is not what I am interested of in this particular project.

The threaded code example was to provide more debug data to the inconsistency happening within rs2::pipeline::stop(). My current guess is that there is some internal data race which causes releasing the resources during rs2::pipeline::stop() to behave inconsistently. I'm gladly proven wrong, but from the looks of it this is an issue within the librealsense SDK.

Do you have the possibility to compile the above program and run it on your end to see if you can replicate the inconsistencies?

MartyG-RealSense commented 2 weeks ago

I am not able to compile and test code on my computer, unfortunately. I do apologize.

Does it make any difference if you place an rs2_release_frame() command to release the frames on the line immediately before the pipe stop line?

https://intelrealsense.github.io/librealsense/doxygen/rs__frame_8h.html#ab3126dc5f202aa932afba37158d73928

I believe that in your particular program the line would look like this:

rs2_release_frame(frames);

Another approach may be to perform a hardware_reset() on all attached cameras instead of just one. https://github.com/IntelRealSense/librealsense/issues/9287#issuecomment-867826974 has a multiple camera C++ reset script to iterate through all cameras.

A reset takes about 2 seconds to complete, but is at least shorter than several seconds.

mattiasbax commented 2 weeks ago

Correct me if I'm wrong but since rs2::frameset inherits from rs2::frame which wraps around the C-struct rs2_frame and calls the rs2_release_frame(...) in the destructor, this is already happening implicitly in my example where frameset goes out of scope? Grabbing the frame reference and releasing it manually had no effect on this issue.

Adding a hardware reset to each camera had no effect on the longer pipeline stopping time.

MartyG-RealSense commented 2 weeks ago

I am not familiar enough with structs and destructors to comment on the mechanics of releasing frames. I do apologize.

Although using threads was a solution, you mentioned at https://github.com/IntelRealSense/librealsense/issues/13006#issuecomment-2162976158 that it would not be suitable for you because you do not wish to poll all cameras simultaneously.

Are you aiming to turn individual cameras on and off and only poll one camera at a time in the set of four D415s in order to reduce the burden of processing multiple cameras simultaneously, please? If you are, then if you manually define the serial number of each of the four cameras in the script then you could enable and disable specific cameras instead of stopping all cameras at the same time.

mattiasbax commented 1 week ago

The problem is not with stopping the cameras in between polling of different cameras. When the polling is done, and all the necessary frames are acquired without any issues, I need to close down the application that has been running to acquire said frames (let's call this application A). When closing the application, I need to stop the camera pipelines (either explicitly by stopping them as in the above example, or implicitly by letting them go out of scope and calling stop in the deconstructor). Assume I have another application that should be running in sequence when application A is done, this has now a delay of several seconds due to the above mentioned bug with stopping pipelines taking long time in certain scenarios.

MartyG-RealSense commented 1 week ago

The following suggestion is a little 'hacky' but you could test it to see whether it makes a difference.

Immediately after the pipeline stop line, set a new configuration that makes sure that all active streams are disabled. Then start the pipeline again to apply the new config and disable all currently active streams. Then finally close the pipeline again in the hope that shutting down all streams released any trapped resources.

camera.pipeline.stop();

camera.config.disable_all_streams();

for (auto& camera : cameras) {
camera.profile = camera.pipeline.start(camera.config);

 for (auto& camera : cameras) {
 camera.pipeline.stop();

https://github.com/IntelRealSense/librealsense/issues/12663 is an interesting discussion about 'hanging threads'.

mattiasbax commented 1 week ago

The following suggestion is a little 'hacky' but you could test it to see whether it makes a difference.

Immediately after the pipeline stop line, set a new configuration that makes sure that all active streams are disabled. Then start the pipeline again to apply the new config and disable all currently active streams. Then finally close the pipeline again in the hope that shutting down all streams released any trapped resources.
camera.pipeline.stop();

camera.config.disable_all_streams();

for (auto& camera : cameras) {
camera.profile = camera.pipeline.start(camera.config);

 for (auto& camera : cameras) {
 camera.pipeline.stop();
12663 is an interesting discussion about 'hanging threads'.

The first stop will take long time, the second stop will take short time, regardless of whether all the streams are disable.

i.e.

        auto start = std::chrono::high_resolution_clock::now();
        for (auto& camera : cameras) {
            camera.pipeline.stop();
        }
        auto end = std::chrono::high_resolution_clock::now();

        auto time_elapsed =
            std::chrono::duration_cast<std::chrono::milliseconds>(end - start)
                .count();

        std::cout << "Stopping cameras took " << time_elapsed << "ms."
                  << std::endl;

        for (auto& camera : cameras) {
            camera.profile = camera.pipeline.start(camera.config);
        }
        start = std::chrono::high_resolution_clock::now();
        for (auto& camera : cameras) {
            camera.pipeline.stop();
        }
        end = std::chrono::high_resolution_clock::now();
        time_elapsed =
            std::chrono::duration_cast<std::chrono::milliseconds>(end - start)
                .count();

        std::cout << "Stopping cameras second time took " << time_elapsed
                  << "ms." << std::endl;

Will give output: Stopping cameras took 1945ms. Stopping cameras second time took 124ms.

And that's regardless of whether disable_all_streams() is used or not. Since I need to stop the pipeline in order to disable all streams, and can't stop the pipeline before it is started, it's a bit of a catch 22 problem

MartyG-RealSense commented 1 week ago

https://github.com/IntelRealSense/librealsense/issues/13006#issuecomment-2160811377 earlier in this discussion established that using threads provides a solution for the issue, but you stated that "using threads is not an option in this case as that would mean that all cameras would poll frames simultaneously which is not what I am interested of in this particular project".

Could you provide further information please about why you do not want all cameras to poll frames simultaneously? Thanks!

mattiasbax commented 1 week ago

I have one camera each in the corner of a square, pointing towards the center. This means that there for each camera there is a camera on the opposite side looking directly towards it. I have noticed that the laser emitter from the opposing camera may cause the auto exposure to behave weirdly as it creates a very bright part of the image, which is why I'm capturing frames from one camera at a time with the other IR emitters turned of. I have also noticed better depth performance for this particular setup when only one IR emitter is turned on rather than having 4 emitters at a rather close distance which might oversaturate some pixels

MartyG-RealSense commented 6 days ago

Multiple RealSense 400 Series cameras do not interfere with each other, and projecting multiple IR dot patterns can be beneficial for depth analysis of the scene because the denser the group of dots projected onto a surface because of the multiple overlapping dot patterns, the better the camera can analyze that area for depth information.

The arrangement of cameras around a squared checkerboard at close range is also used by the RealSense SDK's box_dimensioner_multicam Python box measuring example project. That project works best when the cameras are around 0.5 meters from the board, with negative results when the cameras are nearer or further from the board.

The IR light from the camera's emitter is most intense close to the lens, with its strength falling off as distance from the camera increases. So if cameras were positioned around a board at very close range then I can appreciate how that could create an intense pool of light on the board surface and the surrounding area.

jgrahn commented 5 days ago

Hi @MartyG-RealSense, colleague of @mattiasbax here.

I appreciate that you're looking for solutions and workarounds. To give some context, we've had an team of experienced developers working on this project for more than a year. The camera arrangement is subject to numerous design constraints and not something we can change like in a lab setup. We are fully aware that the cameras are on paper not supposed to interfere with each other, but they do. That might be a different discussion thread to follow up on.

However, what @mattiasbax is reporting here is a likely bug in the SDK relating to rs2::pipeline::stop(). Are there any ways we can assist you in investigating that particular bug? He provided a condensed example program that reproduces the behaviour, that you mentioned you could not compile. Could you provide any more information on what is preventing that, in case we could modify it to help? Is there some other way we can support you in reproducing the issue?

MartyG-RealSense commented 5 days ago

Hi @jgrahn My computers are not set up with a development environment to compile and test programs as I am a Support Engineer.

If you are using the cameras in the same location indoors with consistent artificial lighting and the cameras are not being moved then you could try disabling auto-exposure if you have not attempted this already so that the cameras use a fixed manual exposure value, in order to see whether this makes a difference to your multicam exposure problem.

If disabling auto-exposure and using manual exposure does not resolve your issue then I will be pleased to highlight this case to developer members of my Intel RealSense colleagues for advice.