Closed Kami closed 6 years ago
From my experience, rtl_airband aborts are most often caused by USB connectivity problems (eg. due to RPi being underpowered). Once the dongle gets disconnected, there is no point in restarting rtl_airband before the dongle is reconnected manually. That's why I decided to disable automatic restarting. The supplied service file is just an example, there is no "one size fits all" configuration. You are free to modify it as you wish.
I have two dongles one for fr24feed and one for rtl_airband. Feeding pulse output to two servers with one off line I find that the rtl_airband service closes at random after maybe an hour or so. It's a Pi2 and does not seem heavily loaded. Load average in top is around 0.5. I have added the following in the systemd unit file [Service] section and it restarts the service, so far without a problem: Restart=always RestartSec=3
With the above the sequence of events when the service dies is:
rtl_airband.service: Main process exited, code=killed, status=6/ABRT
rtl_airband.service: Unit entered failed state.
rtl_airband.service: Failed with result 'signal'.
rtl_airband.service: Service hold-off time over, scheduling restart.
Stopped SDR AM/NFM demodulator.
Starting SDR AM/NFM demodulator...
RTLSDR-Airband version 2.4.0 starting
Started SDR AM/NFM demodulator.
pulse: 192.168.1.65: connecting...
pulse: 192.168.1.126: connecting...
Device #1: gain set to 25.40 dB
Device 1 started.
pulse: 192.168.1.65: connected
192.168.1.65: stream "Airband" connected
pulse: 192.168.1.126: connection failed: Connection refused
Googling the error I'm wondering if the service stopping is systemd related? Something along these lines: https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=7&cad=rja&uact=8&ved=0ahUKEwjJtKqAqP7YAhUHrRQKHQZZCE4QFghSMAY&url=https%3A%2F%2Fgithub.com%2Fsystemd%2Fsystemd%2Fissues%2F2942&usg=AOvVaw17sFYN2yTjZJeiAh0UVvRA
It might be a bug in rtl_airband or in Pulseaudio. Too little information here.
Are there any log entries before "Main process exited" line? Messages like "Assertion (...) failed" could shed some light on this.
Out_of_mem.txt I have attached a journal output and system state from top. After 8 hours continuous it was fine - an hour later the above happened which repeats for ever. Looks like the only way out is a re-boot, but memory state looks fine to me. How can we get more verbosity from rtl_airband? Would gdb or strace help or just be too much for the Pi? Certainly no Asserts anywhere.
If there was a SIGABRT (which is visible in the log), then it was almost certainly triggered by a failed assertion.
Could you run the program by hand, in the foreground (ie. with the -f option) and let it run until it breaks? You should see a failed assertion printed to the screen then.
It might also be valuable to enable coredumps beforehand (ie. run "ulimit -c unlimited" in the terminal just before starting rtl_airband from there).
Here you go:
15/ 10 121.850 13/ 10 121.850 13/ 10 118.575rtl_airband: asyncns.c:1500: asyncns_isdone: Assertion `q' failed. Aborted0 118.575 pi@raspberrypi:~ $
There is a core, which I'm guessing may be useful to get a backtrace if you can advise. I am accessing RPi over ssh only.
I am not familiar with debian package management or the debuginfo package naming they use, so without any extra debuginfo this is what I got: bt.txt Is it any use?
Thanks for this, assertion message is enough, I've located the place of the crash. It's Pulseaudio calling asyncns_isdone function (from libasyncdns) with a NULL pointer as an argument. Obviously this is incorrect, but before blaming Pulseaudio I'll double-check if it's not due to rtl_airband interacting with Pulseaudio in some wrong way.
Which Raspbian version are you running? (lsb_release -a
will tell you if you're not sure).
Raspbian GNU/Linux 9.1 (stretch)
I managed to get a pulse-related crash on my Pi, but with a completely different assertion failure.
Could you please run the program in foreground once or twice again and observe if it always fails with the same error message or with different ones? I suspect a race condition here, which may cause the program to crash randomly at various places in the code.
If you add some more pulse outputs to bogus servers in your config, you will hopefully make the crash more probable. It might save you some time.
Thanks in advance.
Only one so far same as before - that took 4 hours with 4 streams! 11/ 9 133.125 10/ 9 133.125rtl_airband: asyncns.c:1500: asyncns_isdone: Assertion `q' failed. Aborted9 133.125
It's quiet here at night so may get more tomorrow. I'm assuming it won't happen with nothing to stream.
Ah another different! 10/ 10 121.850 13/ 10 121.850 13/ 10 118.575Assertion '!m->rebuild_pollfds' failed at pulse/mainloop.c:841, function pa_mainloop_poll(). Aborting. Aborted0 121.850
And another, Assertion '!m->rebuild_pollfds' failed at pulse/mainloop.c:841, function pa_mainloop_poll(). Aborting.
The root cause of all these is probably the same. I can reproduce it now, so I can troubleshoot by myself. I'll get back to you once I have a fix.
In the meantime, just disable pulse outputs to non-working servers. It shall work reliably then.
Great news! Thanks. I notice running manually with rtl_airband -f that there is always a 'PLL not locked!' warning, yet all works fine. Maybe this is being checked too soon before the PLL has chance to lock?
'PLL not locked' is harmless.
I think I fixed the problem - the fix is currently in unstable branch. You can get it from the web (https://github.com/szpajder/RTLSDR-Airband/tree/unstable) via the green button at the top right or just install Git client, if you don't have it (apt-get install git
) and then run git clone https://github.com/szpajder/RTLSDR-Airband.git
. It will create a directory RTLSDR-Airband
in your current directory, so first move your old source directory out of the way. Then compile, install and run as usual.
I've also made a couple of other changes:
syslog
config option has been removed. Logging to syslog is enabled by default. To direct the logs to standard error, use -e
command line option-F
option which launches the program in foreground, but does not display waterfalls.The goal of these changes is to have better cooperation with systemd. When running under systemd with -F -e
command line options, rtl_airband will not print its log messages directly to syslog, but to standard error. It's systemd's job to capture and save them to the log file. The benefit is that any assertion failures (if any) will also get written to the logs, so it will not be necessary to run rtl_airband in foreground to see them.
It's not mandatory to use the new options, but if you wish, then edit your unit file /etc/systemd/system/rtl_airband.service
and do the following:
Type=forking
to Type=simple
-F -e
options to ExecStart
Then save and run sudo systemctl daemon-reload
to update systemd's running configuration.
The example unit file init.d/rtl_airband.service
has been updated to reflect these changes (see https://github.com/szpajder/RTLSDR-Airband/blob/unstable/init.d/rtl_airband.service).
OK it's up and running. I put the Restart=no back in the unit file. Thanks for the nice instructions - you only forgot one step 'git checkout unstable' so it choked on the -F and I realized that was not the right source. We will await the morning to see if it's still up. Finger's crossed :)
Excellent - well done! It has now been rock solid since last night after suspending either or both machines running PA servers multiple times this morning. One thing I have seen (in either version) is that when a machine is suspended and resumed it gains an extra volume slider in the Plasma5 volume control dialogue (under applications tab -> playback streams) in the systray. I had seven on the laptop before re-booting it. Any ideas on whether this may be a PA or plasma5 issue?
Great, thanks for reporting, testing and feedback.
You may use pacmd list-sink-inputs
command on the PA server machine to get the current list of streams from PulseAudio. If these zombie streams are there, it means PulseAudio haven't noticed that they are gone. If they are not there, it's probably a fault on the volume control app side.
Seems then that PA is not seeing that they are gone. So would you suggest a bug report to PA upstream, or is there something about the way the streams are opened, that keeps them for ever?
[baz@leno ~]$ pacmd list-sink-inputs
5 sink input(s) available.
index: 0
driver:
media.name = "AirbandLeno"
application.name = "rtl_airband"
native-protocol.peer = "TCP/IP client from 192.168.1.96:37514"
native-protocol.version = "32"
application.process.id = "22992"
application.process.user = "root"
application.process.host = "raspberrypi"
application.process.binary = "rtl_airband"
application.language = "C"
application.process.machine_id = "41b2082c495a49eb98535e0841ee519c"
module-stream-restore.id = "sink-input-by-application-name:rtl_airband"
index: 7
driver:
Streams are carried over plain TCP connections. When you hibernate the server, the client (rtl_airband) detects stream latency increasing and eventually exceeding the threshold, which causes it to close the connection to PA server and try again. However the FIN packets closing the connection will not make it to the server, because it's hibernated, so after you wake up the server, it will still think the connection is open and PA server will attempt to continue reading from it. But it's not possible to detect TCP peer death only by reading from the socket (it looks like the client is not sending anything). TCP keepalives would have to be enabled on the connection to make this possible, however PA does not seem to utilize keepalives on TCP connections with clients. So the socket on the server will probably hang indefinitely.
I don't know if there is a good solution to this case, nor I'm even sure if the above is correct int the PA context. You may want to ask a question on PA mailing list or file them an issue - at least you will know for sure.
Fix for the PA assertion issue released in version 3.0.0.
As far as the original issue is concerned - the example service file does not force a restart of rtl_airband on crash, however it's just a config file, so anybody is free to customize it to suit his/her needs.
Closing.
For your reference I have reported the suspend/resume issue with pulseaudio upstream here: https://bugs.freedesktop.org/show_bug.cgi?id=104994
First, thanks for working on this project and great work.
Current systemd service file doesn't restart service on crash, it contains
Restart=no
- https://github.com/szpajder/RTLSDR-Airband/blob/master/init.d/rtl_airband.service.I assume this way done to avoid going in a restart loop in case service quite due to an invalid configuration file (this could be prevented by setting
StartLimitBurst
andStartLimitIntervalSec
option.)I think it would still make sense to restart service on crash, because there scenarios where service crashes and restart brings it back up without any other interaction needed.
For example, from my log file:
This happens quite often and simply restarting the service brings it back up and solves the problem (
Restart=on-failure
in systemd service file).