EttusResearch / uhd

The USRP™ Hardware Driver Repository
http://uhd.ettus.com
Other
964 stars 652 forks source link

uhd_find_devices / libusb1_base seg faults #615

Open jpalladino opened 2 years ago

jpalladino commented 2 years ago

Issue Description

Using UHD 4.1.0.5 on either Ubuntu 18.04 or Ubuntu 20.04 machines, we occasionally see seg faults when executing uhd_find_devices. This has been traced back to global session management in 'host/lib/transport/libusb1_base.cpp', specifically 'libusb::session::sptr libusb::session::get_global_session(void)'. It appears that the existence of a global_session is checked for. If a session does exist, the next step is to return a pointer to that session. On occasion, it seems that the session expires just after the check, and an empty shared pointer is returned by get_global_session. This has been tested on many different host machines.

Setup Details

UHD 4.1.0.5 / Ubuntu 18.04 or 20.04. run uhd_find_devices.

Expected Behavior

No Seg Fault

Actual Behaviour

Occasional seg faults.

Steps to reproduce the problem

To reproduce the issue, I would run the following: while true; do date; uhd_find_devices; sleep 6; done Leaving this run, the problem might occur anywhere from 1 to maybe 100 times over 24 hours

Additional Information

When the seg fault occurs, this would be displayed in the terminal:

Mon Jul 25 08:16:00 EDT 2022
[INFO] [UHD] linux; GNU C++ version 7.5.0; Boost_106501; UHD_4.1.0.HEAD-0-g6bd0be9c
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
[DEBUG] [MPMD] Discovering MPM devices on port 49600
Segmentation fault (core dumped)

Checking "dmesg -T" would result in:

[Mon Jul 25 08:14:56 2022] uhd_find_device[30881]: segfault at 0 ip 00007f208cc7efd5 sp 00007f2082ffc500 error 4 in libuhd.so.4.1.0[7f208c2f6000+cb8000]
[Mon Jul 25 08:14:56 2022] Code: 48 c7 47 18 00 00 00 00 48 89 07 48 8d 47 08 48 8d 7c 24 30 48 89 44 24 18 e8 67 be ff ff 48 8b 7c 24 30 48 8d 1d db ca ff ff <48> 8b 07 48 8b 40 10 48 39 d8 0f 85 cb 01 00 00 48 39 d8 0f 85 d9

We were able to capture some coredumps. a backtrace in gdb showed:

#0  libusb_session_impl::get_context (this=0x0)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:53
#1  uhd::transport::libusb::device_list::make ()
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:180
#2  0x00007f7ecf2b295a in uhd::transport::usb_device_handle::get_device_list (
    vid_pid_pair_list=std::vector of length 1, capacity 1 = {...})
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:475
#3  0x00007f7ecf2b30e9 in uhd::transport::usb_device_handle::get_device_list (
    vid=<optimized out>, pid=<optimized out>)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/transport/libusb1_base.cpp:468
#4  0x00007f7ecf10bf59 in b100_find (hint=...)
    at /opt/gnuradio/v3.8/src/uhd/host/lib/usrp/b100/b100_impl.cpp:69
#5  0x00007f7ecf0213c0 in std::_Function_handler<std::vector<uhd::device_addr_t, std::allocator<uhd::device_addr_t> > (uhd::device_addr_t const&), std::vector<uhd::device_addr_t, std::allocator<uhd::device_addr_t> > (*)(uhd::device_addr_t const&)>::_M_invoke(std::_Any_data const&, uhd::device_

Adding a debug message like the following prints the session pointer. When a seg fault occurs, the pointer would print as "0x0". Normally, when not seg faulting, it would show a larger, "proper" looking pointer value.

{
public:
    libusb_device_list_impl(void)
    {
        libusb::session::sptr sess = libusb::session::get_global_session();
        UHD_LOGGER_DEBUG("JIMDEBUG") << "Global Session Pointer: " << sess;
        sess->get_context();

To make the problem occur much more frequently, you can add something that take time after line 102 in libusb1_base.cpp. If I print a log message as follows, the seg faults occur almost every time uhd_find_devices is run:

// not expired -> get existing session
if (not global_session.expired()){
   UHD_LOGGER_DEBUG("JIMDEBUG")
           << "Using old GS pointer.";
   return global_session.lock();
}

Potential Fix

I modified lines 102 and 103 and changed them from:

if (not global_session.expired())
   return global_session.lock();

to

if (auto g_session_ptr = global_session.lock())
    return g_session_ptr;

After rebuilding with this change, we no longer see any seg faults (with multiple hosts running the uhd_find_devices loop for several days). I believe this fix creates a shared pointer as it checks for session expiration, which maintains ownership and prevents session expiration until "get_global_session" returns (assuming the session hadn't already expired prior to calling global_session.lock()). I don't know if this is the most appropriate fix, as I'm not even close to an expert in this kind of thing.

Thanks, Jim

jpalladino commented 1 year ago

In the original post, I noted the issue on 4.1.0.5. I'm just confirming that this issue is still present in UHD 4.3.0.0. However, the potential fix I posted above doesn't seem to help. I'm still getting occasional seg faults. Using UHD 4.3.0.0 on Ubuntu 20.04, the output of uhd_find_devices when it seg faults looks like:

[INFO] [UHD] linux; GNU C++ version 9.4.0; Boost_107100; UHD_ libusb: debug [libusb_get_device_descriptor] Segmentation fault (core dumped)

Thanks, Jim