Open jrudolph opened 8 years ago
I suspect that the problem is related to the one described here: https://sourceforge.net/p/udt/discussion/393036/thread/d95e119f/?limit=25#1c43
By performing the close() before deleting the queues, doesn't this allow for the possibility that between the close() and the queue deletion, a new socket using the old file descriptor could be created in another thread and one or both of the queues could improperly use that new file descriptor? I did not see any synchronization which would prevent this problem. Would moving the channel close() to happen after the queues have been deleted introduce other problems?
In my case, however, the file descriptor is not reused by UDT but by another part of the application which opens a completely unrelated TCP socket with the same file descriptor. This new socket is perfectly fine and will happily block in the recvmsg
call bringing UDT to a halt completely.
We observe a situation where UDT completely hangs with many threads stuck waiting for the
m_ControlLock
.At this point the lock is held by the garbage collection thread (in
checkBrokenSockets
) which is waiting for a rcv queue worker thread termination:The worker thread seems to be stuck in recvmsg:
This doesn't seem to be a classical deadlock, maybe it's more a problem with the blocking
recvmsg
call.Has anyone an idea how this could happen?