Closed eroom1966 closed 5 years ago
OK a follow up, I ran gdb on bluealsa, and it is stuck in a poll() (gdb) where Cannot access memory at address 0x1
Cannot access memory at address 0x1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info thr Id Target Id Frame 7 Thread 0x768323f0 (LWP 678) "bactl" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 6 Thread 0x75eff3f0 (LWP 679) "gdbus" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x74cff3f0 (LWP 2754) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x754ff3f0 (LWP 7528) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x73aff3f0 (LWP 7529) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 2 Thread 0x742ff3f0 (LWP 30925) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81
I am currently testing with the THREAD envvar workaround in README, not sure if this is related to what I am seeing
setting export LIBASOUND_THREAD_SAFE=0 made no difference I am afraid.
I have attached gdb to aplay, and this is where I see it locked up. (gdb) info thr Id Target Id Frame 2 Thread 0x76a27460 (LWP 27412) "pcm-io" 0x76db0c04 in write () at ../sysdeps/unix/syscall-template.S:81
// thread 1 (gdb) where
// thread 2 (gdb) where
any help ?
the last messages from the debug I see are as follows ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096 ../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop
No more messages after this point
Hmm... strange thing... Can you describe how you are able to reproduce this issue? I'll try this on my box and hopefully it will deadlock too. If not, I will think how to guide you through debugging process.
@Arkq sure let me explain.
My application synthesizes short messages and then plays them via aplay. So in effect I have a library of words in .wav files, these are concatenated to make a spoken sentence using 'sox' The resultant .wav file is then played via aplay, using the bluealsa referenced device. these play at a non deterministic rate of maybe one message every 20 seconds. The messages are queued, so there is never a (intentional) situation where 2 aplay commands are running concurrently, a new aplay command only runs, after the previous has exited
In order to reproduce, I leave the application running for a couple of hours, at some point I see that messages are no longer being played, and the aplay process has not returned - this is non-deterministic, and cannot be reproduced with the same inputs
There are so many variables involved here 'bluetoothd', 'blualsa' and 'aplay' - I guess any of these could be at fault ?
Additional information, I noticed the following in the logging ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:287: Couldn't open PCM FIFO: Device or resource busy
the previous message output appeared to end successfully
../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4 ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:318: Freeing HW ../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:272: Closing plugin
I am also seeing this which may be significant aplay: set_params:1297: Unable to install hw params: ACCESS: RW_INTERLEAVED FORMAT: S16_LE SUBFORMAT: STD SAMPLE_BITS: 16 FRAME_BITS: 16 CHANNELS: 1 RATE: 44100 PERIOD_TIME: (85333 85334) PERIOD_SIZE: (3763 3764) PERIOD_BYTES: (7526 7528) PERIODS: (5 6) BUFFER_TIME: (511882 511883) BUFFER_SIZE: 22574 BUFFER_BYTES: 45148 TICK_TIME: 0
@Arkq A quick update, I have scattered debug messages in src/asound/bluealsa-pcm.c to see where the deadlock occurs, and it appears to block here
if ((ret = write(pcm->pcm_fd, head, len)) == -1)
so it looks like the write() syscall to the file descriptor is blocked
It is strange that when I connect a gdb to the bluealsa process, all threads are in poll() not sure how to progress this - any ideas ?
The src/asound/bluealsa-pcm.c
file is a part of PCM ALSA plug-in, so you should attach gdb to the aplay process, then you should see that it indeed has stuck in there.
OK, so since bluealsa waits in poll() and PCM tries to write data however it fails (blocks), that's something. It might be, that bluealsa polls on wrong file descriptor. Please check exactly at which poll()s bulealsa waits in "baio" threads. Then when you locate this poll, check the content of pfds array. It is also possible that bluealsa is here, then as well check content of pfd struct.
@Arkq here is the stacktrace from baio, in fact there are 4 threads in baio - is that correct ? BTW, I think I am missing some debug symbols here, how should I configure/build with debug symbols ?
(gdb) info thr Id Target Id Frame 7 Thread 0x7688d3f0 (LWP 654) "bactl" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 6 Thread 0x75eff3f0 (LWP 655) "gdbus" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x74cff3f0 (LWP 2158) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x754ff3f0 (LWP 3039) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x742ff3f0 (LWP 3040) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 2 Thread 0x73aff3f0 (LWP 2214) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81
Stack trace for each thread
Thread 2 (gdb) where
Thread 3 (gdb) where
Thread 4 (gdb) where
Thread 5 (gdb) where
in fact there are 4 threads in baio - is that correct?
Yes, it is correct, there are 4 baio threads. Nothing to worry about.
Thread 2 is in io_thread_write_bt (len=629, buffer=0x73102b00, fd=23) at ../../src/io.c:141, that's the problem.... This poll() waits forever, because there is no space left in the BT socket (obtained from bluez). Also I see, that you are not using current master build (lines do not align). Try using current master (however, I doubt, that it will work).
In short, the problem is with writing data to bluez transport socket. You can reproduce this, by starting playback to BT headset (or whatever), and put this headset into a microwave oven (do not turn it on :D) or any Faraday cage for microwaves. Bluealsa should stuck in this io.c:141 poll() as well.
I do not know what might cause this problem in your case... The problem might be that the BT link between host and BT device has been dropped, bluetoothd has misbehaved, or bluealsa is using bluez is a wrong way.... Try powering off BT receiver (speaker) and see whether bluealsa has moved forward or is stuck in this poll() forever.
This poll() is here to ensure that every packet is delivered to the BT device. However, it might be a good idea to terminate this poll after a while (few seconds) and indicate that the device is gone. Then, someone (bluealsa client, e.g. PCM plug-in) might try to open PCM device once more.
@Arkq I am running code off the Master, but I took this a few days ago, and of course I have added in lots of debug messages which would cause the line numbers to change. I will pull the latest master and try again and post some results here Thx
It seems like the bluetoothd (version 5.23 RPi 3B / Jessie) gets into some bad state if I go into bluetoothctl, I cannot power on [bluetooth]# power on Failed to set power on: org.bluez.Error.Failed
if I list via rfkill $ rfkill list 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no 1: hci0: Bluetooth Soft blocked: no Hard blocked: no
This is well beyond my capabilities. I tried restarting the bluetooth and bluealsa daemons - no success once the bluetooth cannot power on, the only solution is a system reboot.
any ideas anybody ?
On 12.11.2018 10:31, eroom wrote:
It seems like the bluetooth gets into some bad state if I go into bluetoothctl, I cannot power on [bluetooth]# power on Failed to set power on: org.bluez.Error.Failed
if I list via rfkill $ rfkill list 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no 1: hci0: Bluetooth Soft blocked: no Hard blocked: no
This is well beyond my capabilities. I tried restarting the bluetooth and bluealsa daemons - no success once the bluetooth cannot power on, the only solution is a system reboot.
any ideas anybody ?
Does dmesg reveal anything? Tried unloading/reloading the bluetooth hardware driver? (btusb or whatever applies to the bt hardware you use). Does your bt hardware require firmware, and is that uptodate? Custom or distro kernel?
Sorry if this is information already provided earlier in the thread. I have not paid close attention. There may very well be bugs in the bluetooth kernel subsystem or in the userspace side of bluez. My impression of bluez is not entirely positive. But better make sure everything is as good as we can make it before we start pointing fingers.
Dag B
@dagbdagb At the point where it seems to go wrong I notice the following in the logfile I have added extra instrumentation into the bluz-alsa code, firstly the poll() does not wait forever it times out after 1 second and retries a write 5 times. The interesting thing, the resource becomes unavailable, then bluetoothd reports a 'Suspend' followed by 'Abort'
Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20 Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:155: write()=-1 errno=11 Resource temporarily unavailable retries=5 Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:161: write()=-1 errno=EAGAIN Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20 Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:833: PCM drained Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:455: PCM close for 94:23:6E:5D:F1:78 type 1 stream 0 Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:905: Releasing transport: A2DP Source (SBC) Nov 12 09:55:26 raspberrypi authpriv.notice sudo: pi : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/usr/bin/l2ping -c 1 -i hci0 94:23:6E:5D:F1:78 Nov 12 09:55:26 raspberrypi authpriv.info sudo: pam_unix(sudo:session): session opened for user root by pi(uid=0) Nov 12 09:55:30 raspberrypi daemon.err bluetoothd[643]: Suspend: Connection timed out (110) Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:933: Closing BT: 20 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1073: Exiting IO thread Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1031: Closing PCM: 15 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 14 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:735: State transition: 2 -> 0 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:32 raspberrypi daemon.err bluetoothd[643]: Abort: Connection timed out (110)
at this point now hci0 is reported down, and I cannot bring it up
root@raspberrypi:~/servers# hciconfig hci0: Type: BR/EDR Bus: UART BD Address: B8:27:EB:1D:FA:46 ACL MTU: 1021:8 SCO MTU: 64:1 DOWN RX bytes:20044 acl:219 sco:0 events:1959 errors:0 TX bytes:1799270 acl:3641 sco:0 commands:102 errors:0
root@raspberrypi:~/servers# hciconfig hci0 up Can't init device hci0: Connection timed out (110)
Tried unloading/reloading the bluetooth hardware driver?
Hmm, not sure how to do that, can you advise ?
Final update I installed Debian 9/Stretch and this seems to have solved the problem, so I presume the issues were related to /usr/sbin/bluetoothd - having upgraded to 5.43, this now appears solid closing Many thanks
I saw another posting regarding deadlock conditions and the advice was to rebuild with debug here are my results, after a while having issued a number of aplay commands, one command does not return, here is the tail of the logging, I could attach a debugger if that is any help ? not quite sure what I need to attach to bluealsa or aplay ?
This is running on an RPi 3B running Jessie
Output of stdout ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4 ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:318: Freeing HW ../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:272: Closing plugin ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096 ../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop
Output of syslog Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 0 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1027: Closing PCM: 12 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 11 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:638: Received new connection: 11 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:651: New client accepted: 11 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:353: PCM requested for 94:23:6E:5D:F1:78 type 1 stream 0 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:874: New transport: 15 (MTU: R:672 W:672) Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 0 -> 2 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:122: Created new IO thread: A2DP: A2DP Source Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:386: Starting IO loop: A2DP Source (SBC) Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 2 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-