Closed toots closed 4 years ago
SIGINT
is only handled by sample applications like srt-live-transmit
. The library itself does not catch those exceptions.
@ethouris, you thoughts?
No, SRT doesn't operate with signals at all. There's no call to signal
or sigaction
anywhere.
There is included <csignal>
in channel.cpp
, but it's probably a mistake - I removed it for test and the linux build passes.
If anyone has caught SIGINT
, this definitely wasn't SRT library. The only places where we had any dealing with signals are applications.
Note also that when you get the signal delivered to an application, it's undefined which thread will be interrupted for handling it. You can control it a bit by using pthread_sigmask
(on Linux, at least, not sure how it works on Windows). As well as that SIGINT
has a default action to terminate the application after the signal.
Hi again and thanks for looking at it. I should indeed have looked at the code before filling up this ticket. It looks like the library is not using signals.
However, I have double checked the application and it follows the usual POSIX
pattern of:
SIGINT
/SIGTERM
sigwait
on them.But even with that, I still see logs such as:
^C2019/08/06 00:07:33 [input.srt_5171:3] Feeding failed: Error(Econnlost,Connection was broken.)
This happens systematically first, and before any other thread in the application get to do anything. I have also confirmed it with calling kill -KILL
instead of issuing ctrl+c
.
This still looks to me as the application receiving a CONNLOST
error right after receiving SIGINT
, which prompted my initial reaction.
Somewhere deep in my memory there is some reported problem with signals messing up with internals of SRT, but I believe this was on Mac. Never found this on Linux. And we had some issue reported for that, but it was fixed.
What I know, which may have something to do with this, is that a signal also interrupts a system call, and make the system function return -1 with errno == EINTR
. SRT uses system calls to send and receive data over UDP sockets, but I believe these cases are handled.
If you can reproduce the problem, would you be able to set up logging and grab SRT logs so that we know what is happening inside? Unfortunately the only level that would tell us anything is DEBUG, so you would have to compile SRT with --enable-heavy-logging
and set the debug log level from the srt_setloglevel
function.
Your memory serves you right! I am indeed on OSX
.. I did as suggested and here's the log:
2019/08/06 18:50:02 [srt:5] 18:50:02.758311 D:SRT.c: %625214727:RECEIVED: seq=1310140120 offset=3 ACCEPTED (EXPECTED/ORIGINAL) FLAGS: PB_SOLO ORD_RELAXED EK_NOENC SN_ORIGINAL
2019/08/06 18:50:02 [srt:5] 18:50:02.758416 D:SRT.d: crypter: data not encrypted, returning as plain
2019/08/06 18:50:02 [srt:5] 18:50:02.758449 D:SRT.c: updateCC: EVENT:checktimer
2019/08/06 18:50:02 [srt:5] 18:50:02.758458 D:SRT.c: getRcvReadyMsg: found packet, but not ready to play (only in 34.157ms)
2019/08/06 18:50:02 [srt:5] 18:50:02.758503 D:SRT.c: udpateCC: emitting signal for EVENT:checktimer
2019/08/06 18:50:02 [srt:5] 18:50:02.758566 D:SRT.t: %625214727:tsbpd: FUTURE PACKET seq=1310140105 T=18:40:21.706670 - waiting 34.049ms
2019/08/06 18:50:02 [srt:5] 18:50:02.758650 D:SRT.c: updateCC: updated values from congctl: interval=10879tk (10.88us) cgwindow=8.19e+03
2019/08/06 18:50:02 [srt:5] 18:50:02.759177 D:SRT.c: udpateCC: finished handling for EVENT:checktimer
2019/08/06 18:50:02 [srt:5] 18:50:02.759213 D:SRT.c: worker: result for the unit: INDUCED/CONCLUDING
2019/08/06 18:50:02 [srt:5] 18:50:02.759267 D:SRT.c: worker: RECEIVED PACKET --> updateConnStatus. cst=INDUCED/CONCLUDING id=625214727 pkt-payload-size=1316
2019/08/06 18:50:02 [srt:5] 18:50:02.779222 D:SRT.c: INCOMING PACKET: BOUND=0.0.0.0:8000 TARGET=625214727 CONTROL: shutdown size=4
2019/08/06 18:50:02 [srt:5] 18:50:02.779336 D:SRT.c: %625214727:incoming UMSG:5 (shutdown) socket=%625214727
2019/08/06 18:50:02 [srt:5] 18:50:02.779460 D:SRT.t: recvmsg: DATA COND: KICKED.
2019/08/06 18:50:02 [srt:5] 18:50:02.779547 D:SRT.c: getRcvReadyMsg: found packet, but not ready to play (only in 13.069ms)
2019/08/06 18:50:02 [srt:5] 18:50:02.779690 D:SRT.t: %625214727:tsbpd: EXITING
2019/08/06 18:50:02 [input.srt_5171:3] Feeding failed: Error(Econnlost,Connection was broken.)
It looks like the polling worker receives a shutdown packet just after sending SIGINT
..
@ethouris, @toots. Is there any workaround for this?
From my point of view, this is a minor bug that I reported for completeness. I believe that @ethouris is right in that it only occurs in OSX although I've never properly tried to reproduce on linux. For our target use, we are running linux background processes and do not expect to run into this scenario.
I still don't get what kind of bug is here. In the above log I can see only a reaction on a SHUTDOWN control packet sent from the peer. How exactly this SHUTDOWN from the peer is related to receiving the SIGINT
signal on the agent is completely unclear. So this is either a telepathy between two parties, or the agent is actually sending a SHUTDOWN signal first to the peer, so the SHUTDOWN received from the peer is only a "death cry" - but to confirm this I'd need a longer log (there's no such thing in this one visible) and possibly also a log from the peer in order to make sure what really was the reason for the peer to send the SHUTDOWN signal.
I agree that the cause of the issue is confusing. However, I wouldn't dismiss it too quickly. We just had the following report, which seems related and only started happening with the 1.4
release: https://github.com/savonet/liquidsoap/issues/965
Hi again! I can definitely confirm that https://github.com/savonet/liquidsoap/issues/965 relates to libsrt
acting upon SIGINT
concurrently to the application. The epoll
loop returns an error after sending SIGINT
and before the application has been able to do anything about it. It would be nice to get to the bottom of this since, this time it happened on linux and there isn't much that the application can do except unconditionally ignoring this error.
Here's the log, from the application. Happy to work on a minimal reproduction case if needed:
2019/09/22 20:53:47 [clock.wallclock_input.srt_6444:3] Streaming loop starts, synchronized with wallclock.
^C2019/09/22 20:53:48 [threads:2] Queue generic queue #1 crashed with exception Error(Einvparam,Operation not supported: Bad parameters)
Raised at file "src/srt.ml", line 73, characters 11-37
Called from file "src/srt.ml", line 283, characters 10-156
Hi @toots ,
Yes, I think having a minimal reproduction case would be great. I just looked through SRT epoll API implementation and I don't see anything that would make it exit on SIGINT on any of the supported platforms. Either my eyes are deceiving me or the problem is elsewhere.
Well, I took a look at the imp for srt_epoll_wait
and the part where system functions are used. But also not, the system functions used there are called with error reported by -1 as the number of ready sockets, all values of 0 or less are completely ignored. This function exits exclusively in case when any sockets are ready or timeout passes.
Anyway, I'd need a log from both sides when this happens, which should embrace at least the last portion of a successful transmission - that's the only way to know why the SIGINT signal has caused the transmission to be interrupted.
Maybe catch (...)
in a couple of places is to be blamed?
For example in CUDT::close()
:
try
{
for (set<int>::iterator i = m_sPollID.begin(); i != m_sPollID.end(); ++ i)
s_UDTUnited.m_EPoll.remove_usock(*i, m_SocketID);
}
catch (...)
{
}
@toots An exception can be caught by a debugger, before it is handled. This way you could see the place where it is handled.
Hi guys,
Thanks for looking into this. I've been busy with other tasks lately but I'm gonna try to get back at this issue asap.
Ok, we still don't know what to do with it. @toots, if you are still concerned with this, please reopen the bug and provide the extra information as needed. Now it will be closed due to inactivity.
Thank you and thanks for the time spent looking at this issue. I'll let you know if I have more info/can reproduce. So far, our setup is stable and we're happy with it.
Hi,
While integrating the library with savonet/liquidsoap, I ran into issues where the library catches
SIGINT
and closes the opened sockets.This causes concurrency issues in the application. In short terms, the socket is closed by the library, the streaming thread enters a new
listening
state, gets a new client and starts a blocking decoding loop before the main thread gets a chance to act upon the same signal and tell the streaming source to stop.I believe that a library should not catch and act upon such signals but instead delegate to the main program operating it the order of operations when such signal is received..