EttusResearch / uhd

The USRP™ Hardware Driver Repository
http://uhd.ettus.com
Other
943 stars 645 forks source link

Segfault when creating streamers with N310 #520

Open andrepuschmann opened 2 years ago

andrepuschmann commented 2 years ago

Issue Description

The issue only appears when using the N310 in a two channel configuration. It happens occasionally but is annoying nonetheless since its causing many tests to fail because the eNB/gNB doesn't start up in the first place.

We are using the N310 to test an NSA configuration that uses 2x channels at 15.35Msps. I've compiled UHD 4.1 in debug mode and got following backtrace. Unfortunately not all symbols are there and line numbers aren't shown.

(launched: 2021-11-03_15:12:21.510889)
t
---  Software Radio Systems LTE eNodeB  ---

Reading configuration file /osmo-gsm-tester-srsenb/srsenb_rfci-slave4-n310_10.12.1.214/srsenb.conf...

Built in Debug mode using commit bcb4b594c on branch disable_backward.

Opening 2 channels in RF device=uhd with args=type=n3xx,tx_subdev_spec=A:0 B:0,rx_subdev_spec=A:0 B:0,None
Available RF device list: UHD  zmq 
[INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; UHD_4.1.0.HEAD-0-g25d617ca
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Opening USRP channels=2, args: type=n3xx,tx_subdev_spec=A:0 B:0,rx_subdev_spec=A:0 B:0,None=,master_clock_rate=122.88e6
[INFO] [UHD RF] RF UHD Generic instance constructed
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.20.2,type=n3xx,product=n310,serial=317F537,fpga=HG,claimed=False,addr=192.168.20.2,None=,master_clock_rate=122.88e6
[WARNING] [MPM.RPCServer] A timeout event occured!
[INFO] [MPM.PeriphManager] init() called with device args `None=,fpga=HG,master_clock_rate=122.88e6,mgmt_addr=192.168.20.2,product=n310,clock_source=internal,time_source=internal'.
[WARNING] [RFNOC::GRAPH] One or more blocks timed out during flush!
[INFO] [UHD RF] Setting tx_subdev_spec to 'A:0 B:0'
[INFO] [UHD RF] Setting rx_subdev_spec to 'A:0 B:0'
[INFO] [MULTI_USRP]     1) catch time transition at pps edge
[INFO] [MULTI_USRP]     2) set times next pps (synchronously)
--- command='/osmo-gsm-tester-srsenb/srslte/bin/srsenb /osmo-gsm-tester-srsenb/srsenb_rfci-slave4-n310_10.12.1.214/srsenb.conf' version=21.10.0 signal=11 date='03/11/2021 14:12:31' ---
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(+0xd88926) [0x55e128d46926]
    /lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7f1bcc148210]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(_ZN3uhd5rfnoc4chdr12mgmt_payload11deserializeEPKmmRKSt8functionIFmmEE+0x31f) [0x7f1bcb501497]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x2d07aa) [0x7f1bcb56e7aa]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x2d1497) [0x7f1bcb56f497]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x2d8e6e) [0x7f1bcb576e6e]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x26d090) [0x7f1bcb50b090]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x7c9d4b) [0x7f1bcba67d4b]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x28682d) [0x7f1bcb52482d]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x28a12f) [0x7f1bcb52812f]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x2c17a8) [0x7f1bcb55f7a8]
    /opt/uhd-4.1/lib/libuhd.so.4.1.0(+0x3ccaae) [0x7f1bcb66aaae]
    /osmo-gsm-tester-srsenb/srslte/lib/libsrsran_rf.so.0(_ZN14rf_uhd_generic13get_rx_streamERm+0x1a6) [0x7f1bcc8e11be]
    /osmo-gsm-tester-srsenb/srslte/lib/libsrsran_rf.so.0(+0x77073) [0x7f1bcc8c9073]
    /osmo-gsm-tester-srsenb/srslte/lib/libsrsran_rf.so.0(rf_uhd_open_multi+0x14c) [0x7f1bcc8c9d65]
    /osmo-gsm-tester-srsenb/srslte/lib/libsrsran_rf.so.0(srsran_rf_open_devname+0x141) [0x7f1bcc8c4f42]
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(_ZN6srsran5radio8open_devERKjRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_+0x109) [0x55e128f1c61f]
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(_ZN6srsran5radio4initERKNS_9rf_args_tEPNS_19phy_interface_radioE+0x4c8) [0x55e128f1a8aa]
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(_ZN6srsenb3enb4initERKNS_10all_args_tE+0x55f) [0x55e1289daf39]
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(main+0xb09) [0x55e1289b48e6]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f1bcc1290b3]
    /osmo-gsm-tester-srsenb/srslte/bin/srsenb(_start+0x2e) [0x55e1289ac95e]
srsRAN crashed. Please send this backtrace to the developers ...

Setup Details

Expected Behavior

No UHD crash when starting eNB.

Actual Behaviour

UHD segfaults occasionally.

Steps to reproduce the problem

I've not been able to reproduce the issue with the UHD examples but the srsRAN appnote for running COTS UEs here contains all config steps. The UHD device args for the N310 are shown at the end of the document.

Note that you don't need a COTS UE or even a core network. Just starting the eNB with this config crashes the UHD every so often.

Additional Information

Let me know if you need further details or want me to compile with different flags to maybe get more debug info.

andrepuschmann commented 2 years ago

Here is another segfault with stacktrace of the same issue I believe:

---  Software Radio Systems LTE eNodeB  ---

Reading configuration file enb.conf...

Built in Release mode using commit 0967cda04 on branch dev.

Opening 2 channels in RF device=uhd with args=type=n3xx,tx_subdev_spec=A:0 B:0,rx_subdev_spec=A:0 B:0
Available RF device list: UHD  soapy  zmq
[INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; UHD_4.1.0.2-1-gceac1bdd
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Opening USRP channels=2, args: type=n3xx,tx_subdev_spec=A:0 B:0,rx_subdev_spec=A:0 B:0,master_clock_rate=122.88e6
[INFO] [UHD RF] RF UHD Generic instance constructed
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.20.2,type=n3xx,product=n310,serial=317F537,fpga=HG,claimed=False,addr=192.168.20.2,master_clock_rate=122.88e6
[WARNING] [MPM.RPCServer] A timeout event occured!
[INFO] [MPM.PeriphManager] init() called with device args `fpga=HG,master_clock_rate=122.88e6,mgmt_addr=192.168.20.2,product=n310,clock_source=internal,time_source=internal'.
[WARNING] [RFNOC::GRAPH] One or more blocks timed out during flush!
[INFO] [UHD RF] Setting tx_subdev_spec to 'A:0 B:0'
[INFO] [UHD RF] Setting rx_subdev_spec to 'A:0 B:0'
[INFO] [MULTI_USRP]     1) catch time transition at pps edge
[INFO] [MULTI_USRP]     2) set times next pps (synchronously)
Stack trace (most recent call last):
#19   Object "", at 0xffffffffffffffff, in
#18   Object "/home/anpu/src/srsLTE/build_release/srsenb/src/srsenb", at 0x56079edc00bd, in _start
#17   Source "../csu/libc-start.c", line 308, in __libc_start_main [0x7f9c743730b2]
#16   Object "/home/anpu/src/srsLTE/build_release/srsenb/src/srsenb", at 0x56079edbd888, in main
#15   Object "/home/anpu/src/srsLTE/build_release/srsenb/src/srsenb", at 0x56079eddc917, in srsenb::enb::init(srsenb::all_args_t const&)
#14   Object "/home/anpu/src/srsLTE/build_release/srsenb/src/srsenb", at 0x56079f208fd7, in srsran::radio::init(srsran::rf_args_t const&, srsran::phy_interface_radio*)
#13   Object "/home/anpu/src/srsLTE/build_release/srsenb/src/srsenb", at 0x56079f200391, in srsran::radio::open_dev(unsigned int const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#12   Object "/home/anpu/src/srsLTE/build_release/lib/src/phy/rf/libsrsran_rf.so.21.10.0", at 0x7f9c74b1b1fb, in rf_uhd_open_multi
#11   Object "/home/anpu/src/srsLTE/build_release/lib/src/phy/rf/libsrsran_rf.so.21.10.0", at 0x7f9c74b19cc9, in uhd_init(rf_uhd_handler_t*, char*, unsigned int)
#10   Object "/home/anpu/src/srsLTE/build_release/lib/src/phy/rf/libsrsran_rf.so.21.10.0", at 0x7f9c74b27952, in rf_uhd_generic::get_rx_stream(unsigned long&)
#9    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c739fdc57, in multi_usrp_rfnoc::get_rx_stream(uhd::stream_args_t const&)
#8    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c73907475, in rfnoc_graph_impl::connect(uhd::rfnoc::block_id_t const&, unsigned long, std::shared_ptr<uhd::rx_streamer>, unsigned long, unsigned long)
#7    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c738d0234, in graph_stream_manager_impl::create_device_to_host_data_stream(std::pair<unsigned short, unsigned short>, uhd::rfnoc::sw_buff_t, uhd::rfnoc::sw_buff_t, unsigned long, uhd::device_addr_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#6    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c738ce2f5, in link_stream_manager_impl::create_device_to_host_data_stream(std::pair<unsigned short, unsigned short>, uhd::rfnoc::sw_buff_t, uhd::rfnoc::sw_buff_t, uhd::device_addr_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#5    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c73d33b3e, in uhd::mpmd::mpmd_mboard_impl::mpmd_mb_iface::make_rx_data_transport(uhd::rfnoc::mgmt::mgmt_portal&, std::pair<std::pair<unsigned short, unsigned short>, std::pair<unsigned short, unsigned short> > const&, std::pair<unsigned short, unsigned short> const&, uhd::rfnoc::sw_buff_t, uhd::rfnoc::sw_buff_t, uhd::device_addr_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#4    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c738b8019, in uhd::rfnoc::chdr_rx_data_xport::configure_sep(std::shared_ptr<uhd::transport::io_service>, std::shared_ptr<uhd::transport::recv_link_if>, std::shared_ptr<uhd::transport::send_link_if>, uhd::rfnoc::chdr::chdr_packet_factory const&, uhd::rfnoc::mgmt::mgmt_portal&, std::pair<unsigned short, unsigned short> const&, uhd::rfnoc::sw_buff_t, uhd::rfnoc::sw_buff_t, uhd::rfnoc::stream_buff_params_t const&, uhd::rfnoc::stream_buff_params_t const&, uhd::rfnoc::stream_buff_params_t const&, bool, std::function<void ()>)
#3    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c7391c0a4, in uhd::rfnoc::mgmt::mgmt_portal_impl::config_local_rx_stream_commit(uhd::rfnoc::chdr_ctrl_xport&, unsigned short const&, double, bool)
#2    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c739164c6, in uhd::rfnoc::mgmt::mgmt_portal_impl::_get_ostrm_status(uhd::rfnoc::chdr_ctrl_xport&, std::vector<std::pair<uhd::rfnoc::mgmt::node_id_t, int>, std::allocator<std::pair<uhd::rfnoc::mgmt::node_id_t, int> > > const&)
#1    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c73910f54, in uhd::rfnoc::mgmt::mgmt_portal_impl::_send_recv_mgmt_transaction(uhd::rfnoc::chdr_ctrl_xport&, uhd::rfnoc::chdr::mgmt_payload const&, double) [clone .constprop.0]
#0    Object "/opt/uhd-4.1-release/lib/libuhd.so.4.1.0", at 0x7f9c738ae695, in uhd::rfnoc::chdr::mgmt_payload::deserialize(unsigned long const*, unsigned long, std::function<unsigned long (unsigned long)> const&)
Segmentation fault (Address not mapped to object [0x5607ba1e9000])
Segmentation fault
wkunice commented 1 year ago

I am seeing a similiar crash testing 4.2. It is in deserialize. It looks like the message length is very large.