Haivision / srt

Secure, Reliable, Transport
https://www.srtalliance.org
Mozilla Public License 2.0
3.08k stars 840 forks source link

[BUG] Lock order violation: m_ConnectionLock and m_LSLock. #2970

Closed maxsharabayko closed 1 month ago

maxsharabayko commented 3 months ago

This issue was initially listed in this comment. Creating a separate ticket on it to track progress and investigation results.

Lock order "m_ConnectionLock before m_LSLock" is violated.

Case 1. Retrieve Socket Options from a Listen Callback

CRcvQueue::worker_ProcessConnectionRequest(..) locks m_LSLock to protect CRcvQueue::m_pListener from modifications while processing a connection request.

Within this function a listen callback may be called where an application can retrieve socket options srt_getsockopt for the newly created socket (being considered for accepting). The srt_getsockopt locks m_ConnectionLock while a lock on m_LSLock is still being held.

Thread checker report (click to expand) ```shell 0x4FCEB70 - m_ConnectionLock 0x4FD0A48 - m_LSLock Thread #9: lock order "0x4FCEB70 before 0x4FD0A48" violated Observed (incorrect) order is: acquisition of lock at 0x4FD0A48 at 0x48510B8: mutex_lock_WRK (hg_intercepts.c:944) by 0x4E7487: srt::sync::Mutex::lock() (sync_posix.cpp:220) by 0x4E74FB: srt::sync::ScopedLock::ScopedLock(srt::sync::Mutex&) (sync_posix.cpp:236) by 0x56C6CF: srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) (queue.cpp:1407) by 0x56BFA7: srt::CRcvQueue::worker(void*) (queue.cpp:1247) by 0x4D11FD7: ??? (in /lib/libpthread-2.23.so) followed by a later acquisition of lock at 0x4FCEB70 at 0x48510B8: mutex_lock_WRK (hg_intercepts.c:944) by 0x4E7487: srt::sync::Mutex::lock() (sync_posix.cpp:220) by 0x4E74FB: srt::sync::ScopedLock::ScopedLock(srt::sync::Mutex&) (sync_posix.cpp:236) by 0x51A513: srt::CUDT::getOpt(SRT_SOCKOPT, void*, int&) (core.cpp:459) ---> m_ConnectionLock by 0x4F59EF: srt::CUDT::getsockopt(int, int, SRT_SOCKOPT, void*, int*) (api.cpp:3783) by 0x4E5BEF: srt_getsockopt (srt_c_api.cpp:163) by 0x4BBE87: SrtConn_ListenCB (srtc_mod.c:9753) by 0x53C993: srt::CUDT::runAcceptHook(srt::CUDT*, sockaddr const*, srt::CHandShake const&, srt::CPacket const&) (core.cpp:11814) by 0x4E9E0B: srt::CUDTUnited::newConnection(int, srt::sockaddr_any const&, srt::CPacket const&, srt::CHandShake&, int&, srt::CUDT*&) (api.cpp:611) by 0x53AA93: srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&) (core.cpp:11101) by 0x56C7F3: srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) (queue.cpp:1411) by 0x56BFA7: srt::CRcvQueue::worker(void*) (queue.cpp:1247) Required order was established by acquisition of lock at 0x4FCEB70 at 0x48510B8: mutex_lock_WRK (hg_intercepts.c:944) by 0x4E7487: srt::sync::Mutex::lock() (sync_posix.cpp:220) by 0x4E74FB: srt::sync::ScopedLock::ScopedLock(srt::sync::Mutex&) (sync_posix.cpp:236) by 0x51BDBF: srt::CUDT::setListenState() (core.cpp:1005) by 0x4EB46B: srt::CUDTUnited::listen(int, int) (api.cpp:1012) by 0x4F433F: srt::CUDT::listen(int, int) (api.cpp:3571) by 0x4E593F: srt_listen (srt_c_api.cpp:114) by 0x47B2A7: srtconn_SetupListener (srtc_mod.c:4291) by 0x48155F: srtgroup_Listen (srtc_mod.c:4803) by 0x48324B: srtgroup_Open (srtc_mod.c:4910) by 0x4848A3: srtconn_Open (srtc_mod.c:5044) by 0x489EA3: SrtConn_Open (srtc_mod.c:5308) followed by a later acquisition of lock at 0x4FD0A48 at 0x48510B8: mutex_lock_WRK (hg_intercepts.c:944) by 0x4E7487: srt::sync::Mutex::lock() (sync_posix.cpp:220) by 0x4E74FB: srt::sync::ScopedLock::ScopedLock(srt::sync::Mutex&) (sync_posix.cpp:236) by 0x56D31F: srt::CRcvQueue::setListener(srt::CUDT*) (queue.cpp:1693) by 0x51BECB: srt::CUDT::setListenState() (core.cpp:1018) by 0x4EB46B: srt::CUDTUnited::listen(int, int) (api.cpp:1012) by 0x4F433F: srt::CUDT::listen(int, int) (api.cpp:3571) by 0x4E593F: srt_listen (srt_c_api.cpp:114) by 0x47B2A7: srtconn_SetupListener (srtc_mod.c:4291) by 0x48155F: srtgroup_Listen (srtc_mod.c:4803) by 0x48324B: srtgroup_Open (srtc_mod.c:4910) by 0x4848A3: srtconn_Open (srtc_mod.c:5044) ```

Case 2. Data Race if App Sets POST Socket Options Right After a Call to srt_listen()

Simultaneous access to listener's config (CUDT::m_config) from SRT internal newConnection(..) and the app thread called for srt_listen(). Although it does not seem the right thing to do, still the data race has to be avoided. Described in this comment. This way srt-xtransmit sets POST socket options.

Thread checker report (click to expand) ```shell WARNING: ThreadSanitizer: data race (pid=2590) Read of size 8 at 0x7ba8000000a8 by thread T3 (mutexes: write M227): #0 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 (libtsan.so.0+0x6243e) #1 memcpy ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:819 (libtsan.so.0+0x6243e) #2 srt::CSrtConfig::operator=(srt::CSrtConfig const&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/socketconfig.h:167 (srt-xtransmit+0x2b6d98) #3 srt::CUDT::CUDT(srt::CUDTSocket*, srt::CUDT const&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/core.cpp:335 (srt-xtransmit+0x2796e6) #4 srt::CUDTSocket::CUDTSocket(srt::CUDTSocket const&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/api.h:113 (srt-xtransmit+0x24de9e) #5 srt::CUDTUnited::newConnection(int, srt::sockaddr_any const&, srt::CPacket const&, srt::CHandShake&, int&, srt::CUDT*&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/api.cpp:544 (srt-xtransmit+0x233312) #6 srt::CUDT::processConnectRequest(srt::sockaddr_any const&, srt::CPacket&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/core.cpp:11024 (srt-xtransmit+0x2b2175) #7 srt::CRcvQueue::worker_ProcessConnectionRequest(srt::CUnit*, srt::sockaddr_any const&) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/queue.cpp:1406 (srt-xtransmit+0x309743) #8 srt::CRcvQueue::worker(void*) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/queue.cpp:1238 (srt-xtransmit+0x30828c) Previous write of size 1 at 0x7ba8000000a9 by main thread (mutexes: write M188, write M193, write M192): #0 (anonymous namespace)::CSrtConfigSetter<(SRT_SOCKOPT)2>::set(srt::CSrtConfig&, void const*, int) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/socketconfig.cpp:213 (srt-xtransmit+0x31c8c3) #1 (anonymous namespace)::dispatchSet(SRT_SOCKOPT, srt::CSrtConfig&, void const*, int) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/socketconfig.cpp:903 (srt-xtransmit+0x31acf0) #2 srt::CSrtConfig::set(SRT_SOCKOPT, void const*, int) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/socketconfig.cpp:954 (srt-xtransmit+0x31a7fd) #3 srt::CUDT::setOpt(SRT_SOCKOPT, void const*, int) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/core.cpp:404 (srt-xtransmit+0x27bbe4) #4 srt::CUDT::setsockopt(int, int, SRT_SOCKOPT, void const*, int) /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/api.cpp:3641 (srt-xtransmit+0x243f71) #5 srt_setsockopt /mnt/d/Projects/srt/srt-xtransmit-multi/submodule/srt/srtcore/srt_c_api.cpp:165 (srt-xtransmit+0x3227c9) #6 xtransmit::socket::srt::configure_post(int) /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/srt_socket.cpp:324 (srt-xtransmit+0x10b67a) #7 xtransmit::socket::srt::listen() /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/srt_socket.cpp:104 (srt-xtransmit+0x10968a) #8 xtransmit::create_connection(std::vector > const&, std::shared_ptr&) /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/misc.cpp:70 (srt-xtransmit+0xd330f) #9 xtransmit::common_run(std::vector, std::allocator >, std::allocator, std::allocator > > > const&, xtransmit::stats_config const&, xtransmit::conn_config const&, std::atomic const&, std::function, std::atomic const&)>&) /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/misc.cpp:145 (srt-xtransmit+0xd37f6) #10 xtransmit::receive::run(std::vector, std::allocator >, std::allocator, std::allocator > > > const&, xtransmit::receive::config const&, std::atomic const&) /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/receive.cpp:141 (srt-xtransmit+0xe190f) #11 main /mnt/d/Projects/srt/srt-xtransmit-multi/xtransmit/xtransmit-app.cpp:231 (srt-xtransmit+0x127092) ```

Status

Some ongoing work is in PR #2961.

ethouris commented 3 months ago

This is a known problem, but not much of a solution has been proposed so far. Fortunately the situation executed with different lock ordering can't happen simultaneously, except for a situation when trying to close the socket in the listener callback handler.