arkq / bluez-alsa

Bluetooth Audio ALSA Backend
MIT License
862 stars 189 forks source link

lockup condition #169

Closed eroom1966 closed 5 years ago

eroom1966 commented 5 years ago

I saw another posting regarding deadlock conditions and the advice was to rebuild with debug here are my results, after a while having issued a number of aplay commands, one command does not return, here is the tail of the logging, I could attach a debugger if that is any help ? not quite sure what I need to attach to bluealsa or aplay ?

This is running on an RPi 3B running Jessie

Output of stdout ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4 ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:318: Freeing HW ../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:272: Closing plugin ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096 ../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop

Output of syslog Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 0 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1027: Closing PCM: 12 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 11 Nov 8 14:03:44 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:638: Received new connection: 11 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:651: New client accepted: 11 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:353: PCM requested for 94:23:6E:5D:F1:78 type 1 stream 0 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:874: New transport: 15 (MTU: R:672 W:672) Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 0 -> 2 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:122: Created new IO thread: A2DP: A2DP Source Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:386: Starting IO loop: A2DP Source (SBC) Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:734: State transition: 2 -> 2 Nov 8 14:03:50 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+-

eroom1966 commented 5 years ago

OK a follow up, I ran gdb on bluealsa, and it is stuck in a poll() (gdb) where Cannot access memory at address 0x1

0 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81

1 0x76c42528 in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0

Cannot access memory at address 0x1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) info thr Id Target Id Frame 7 Thread 0x768323f0 (LWP 678) "bactl" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 6 Thread 0x75eff3f0 (LWP 679) "gdbus" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x74cff3f0 (LWP 2754) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x754ff3f0 (LWP 7528) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x73aff3f0 (LWP 7529) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81 2 Thread 0x742ff3f0 (LWP 30925) "baio" 0x76ac4b80 in poll () at ../sysdeps/unix/syscall-template.S:81

eroom1966 commented 5 years ago

I am currently testing with the THREAD envvar workaround in README, not sure if this is related to what I am seeing

eroom1966 commented 5 years ago

setting export LIBASOUND_THREAD_SAFE=0 made no difference I am afraid.

I have attached gdb to aplay, and this is where I see it locked up. (gdb) info thr Id Target Id Frame 2 Thread 0x76a27460 (LWP 27412) "pcm-io" 0x76db0c04 in write () at ../sysdeps/unix/syscall-template.S:81

// thread 1 (gdb) where

0 0x0000db68 in ?? ()

1 0x76e9f5e4 in snd_pcm_wait () from /usr/lib/arm-linux-gnueabihf/libasound.so.2

2 0x00000002 in ?? ()

// thread 2 (gdb) where

0 0x76db0c04 in write () at ../sysdeps/unix/syscall-template.S:81

1 0x76a546c4 in io_thread (arg=0x1129368) at ../../../src/asound/bluealsa-pcm.c:185

2 0x76da8e90 in start_thread (arg=0x76a27460) at pthread_create.c:311

3 0x76d32598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

any help ?

the last messages from the debug I see are as follows ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:305: FIFO buffer size: 4096 ../../../src/asound/bluealsa-pcm.c:311: Selected HW buffer: 6 periods x 16380 bytes <= 98284 bytes ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:342: Prepared ../../../src/asound/bluealsa-pcm.c:326: Initializing SW ../../../src/asound/bluealsa-pcm.c:214: Starting ../../../src/asound/../shared/ctl-client.c:443: Requesting PCM resume for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:110: Starting IO loop

No more messages after this point

arkq commented 5 years ago

Hmm... strange thing... Can you describe how you are able to reproduce this issue? I'll try this on my box and hopefully it will deadlock too. If not, I will think how to guide you through debugging process.

eroom1966 commented 5 years ago

@Arkq sure let me explain.

My application synthesizes short messages and then plays them via aplay. So in effect I have a library of words in .wav files, these are concatenated to make a spoken sentence using 'sox' The resultant .wav file is then played via aplay, using the bluealsa referenced device. these play at a non deterministic rate of maybe one message every 20 seconds. The messages are queued, so there is never a (intentional) situation where 2 aplay commands are running concurrently, a new aplay command only runs, after the previous has exited

In order to reproduce, I leave the application running for a couple of hours, at some point I see that messages are no longer being played, and the aplay process has not returned - this is non-deterministic, and cannot be reproduced with the same inputs

There are so many variables involved here 'bluetoothd', 'blualsa' and 'aplay' - I guess any of these could be at fault ?

eroom1966 commented 5 years ago

Additional information, I noticed the following in the logging ../../../src/asound/../shared/ctl-client.c:105: Connecting to socket: /var/run/bluealsa/hci0 ../../../src/asound/../shared/ctl-client.c:219: Getting transport for 94:23:6E:5D:F1:78 type 1 ../../../src/asound/bluealsa-pcm.c:534: Setting constraints ../../../src/asound/bluealsa-pcm.c:282: Initializing HW ../../../src/asound/../shared/ctl-client.c:375: Requesting PCM open for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:287: Couldn't open PCM FIFO: Device or resource busy

the previous message output appeared to end successfully

../../../src/asound/../shared/ctl-client.c:468: Requesting PCM drain for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:121: IO thread paused: 4 ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:254: Stopping ../../../src/asound/bluealsa-pcm.c:318: Freeing HW ../../../src/asound/../shared/ctl-client.c:417: Closing PCM for 94:23:6E:5D:F1:78 ../../../src/asound/bluealsa-pcm.c:272: Closing plugin

I am also seeing this which may be significant aplay: set_params:1297: Unable to install hw params: ACCESS: RW_INTERLEAVED FORMAT: S16_LE SUBFORMAT: STD SAMPLE_BITS: 16 FRAME_BITS: 16 CHANNELS: 1 RATE: 44100 PERIOD_TIME: (85333 85334) PERIOD_SIZE: (3763 3764) PERIOD_BYTES: (7526 7528) PERIODS: (5 6) BUFFER_TIME: (511882 511883) BUFFER_SIZE: 22574 BUFFER_BYTES: 45148 TICK_TIME: 0

eroom1966 commented 5 years ago

@Arkq A quick update, I have scattered debug messages in src/asound/bluealsa-pcm.c to see where the deadlock occurs, and it appears to block here

if ((ret = write(pcm->pcm_fd, head, len)) == -1)

so it looks like the write() syscall to the file descriptor is blocked

It is strange that when I connect a gdb to the bluealsa process, all threads are in poll() not sure how to progress this - any ideas ?

arkq commented 5 years ago

The src/asound/bluealsa-pcm.c file is a part of PCM ALSA plug-in, so you should attach gdb to the aplay process, then you should see that it indeed has stuck in there.

OK, so since bluealsa waits in poll() and PCM tries to write data however it fails (blocks), that's something. It might be, that bluealsa polls on wrong file descriptor. Please check exactly at which poll()s bulealsa waits in "baio" threads. Then when you locate this poll, check the content of pfds array. It is also possible that bluealsa is here, then as well check content of pfd struct.

eroom1966 commented 5 years ago

@Arkq here is the stacktrace from baio, in fact there are 4 threads in baio - is that correct ? BTW, I think I am missing some debug symbols here, how should I configure/build with debug symbols ?

(gdb) info thr Id Target Id Frame 7 Thread 0x7688d3f0 (LWP 654) "bactl" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 6 Thread 0x75eff3f0 (LWP 655) "gdbus" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 5 Thread 0x74cff3f0 (LWP 2158) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x754ff3f0 (LWP 3039) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x742ff3f0 (LWP 3040) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81 2 Thread 0x73aff3f0 (LWP 2214) "baio" 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

Stack trace for each thread

Thread 2 (gdb) where

0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

1 0x00017c98 in io_thread_write_bt (len=629, buffer=0x73102b00 "\200`\223\211[\n\274\063", fd=23) at ../../src/io.c:141

2 io_thread_a2dp_source_sbc (arg=0x0) at ../../src/io.c:488

3 0x76b9fe90 in start_thread (arg=0x73aff3f0) at pthread_create.c:311

4 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 3 (gdb) where

0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

1 0x000180e8 in io_thread_sco (arg=0x1d168) at ../../src/io.c:1414

2 0x76b9fe90 in start_thread (arg=0x742ff3f0) at pthread_create.c:311

3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 4 (gdb) where

0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

1 0x000197d8 in rfcomm_thread (arg=0x0) at ../../src/rfcomm.c:706

2 0x76b9fe90 in start_thread (arg=0x754ff3f0) at pthread_create.c:311

3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

Thread 5 (gdb) where

0 0x76b1fb80 in poll () at ../sysdeps/unix/syscall-template.S:81

1 0x000197d8 in rfcomm_thread (arg=0x0) at ../../src/rfcomm.c:706

2 0x76b9fe90 in start_thread (arg=0x74cff3f0) at pthread_create.c:311

3 0x76b29598 in ?? () at ../ports/sysdeps/unix/sysv/linux/arm/nptl/../clone.S:92 from /lib/arm-linux-gnueabihf/libc.so.6

arkq commented 5 years ago

in fact there are 4 threads in baio - is that correct?

Yes, it is correct, there are 4 baio threads. Nothing to worry about.

Thread 2 is in io_thread_write_bt (len=629, buffer=0x73102b00, fd=23) at ../../src/io.c:141, that's the problem.... This poll() waits forever, because there is no space left in the BT socket (obtained from bluez). Also I see, that you are not using current master build (lines do not align). Try using current master (however, I doubt, that it will work).

In short, the problem is with writing data to bluez transport socket. You can reproduce this, by starting playback to BT headset (or whatever), and put this headset into a microwave oven (do not turn it on :D) or any Faraday cage for microwaves. Bluealsa should stuck in this io.c:141 poll() as well.

I do not know what might cause this problem in your case... The problem might be that the BT link between host and BT device has been dropped, bluetoothd has misbehaved, or bluealsa is using bluez is a wrong way.... Try powering off BT receiver (speaker) and see whether bluealsa has moved forward or is stuck in this poll() forever.

This poll() is here to ensure that every packet is delivered to the BT device. However, it might be a good idea to terminate this poll after a while (few seconds) and indicate that the device is gone. Then, someone (bluealsa client, e.g. PCM plug-in) might try to open PCM device once more.

eroom1966 commented 5 years ago

@Arkq I am running code off the Master, but I took this a few days ago, and of course I have added in lots of debug messages which would cause the line numbers to change. I will pull the latest master and try again and post some results here Thx

eroom1966 commented 5 years ago

It seems like the bluetoothd (version 5.23 RPi 3B / Jessie) gets into some bad state if I go into bluetoothctl, I cannot power on [bluetooth]# power on Failed to set power on: org.bluez.Error.Failed

if I list via rfkill $ rfkill list 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no 1: hci0: Bluetooth Soft blocked: no Hard blocked: no

This is well beyond my capabilities. I tried restarting the bluetooth and bluealsa daemons - no success once the bluetooth cannot power on, the only solution is a system reboot.

any ideas anybody ?

dagbdagb commented 5 years ago

On 12.11.2018 10:31, eroom wrote:

It seems like the bluetooth gets into some bad state if I go into bluetoothctl, I cannot power on [bluetooth]# power on Failed to set power on: org.bluez.Error.Failed

if I list via rfkill $ rfkill list 0: phy0: Wireless LAN Soft blocked: no Hard blocked: no 1: hci0: Bluetooth Soft blocked: no Hard blocked: no

This is well beyond my capabilities. I tried restarting the bluetooth and bluealsa daemons - no success once the bluetooth cannot power on, the only solution is a system reboot.

any ideas anybody ?

Does dmesg reveal anything? Tried unloading/reloading the bluetooth hardware driver? (btusb or whatever applies to the bt hardware you use). Does your bt hardware require firmware, and is that uptodate? Custom or distro kernel?

Sorry if this is information already provided earlier  in the thread. I have not paid close attention. There may very well be bugs in the bluetooth kernel subsystem or in the userspace side of bluez. My impression of bluez is not entirely positive. But better make sure everything is as good as we can make it before we start pointing fingers.

Dag B

eroom1966 commented 5 years ago

@dagbdagb At the point where it seems to go wrong I notice the following in the logfile I have added extra instrumentation into the bluz-alsa code, firstly the poll() does not wait forever it times out after 1 second and retries a write 5 times. The interesting thing, the resource becomes unavailable, then bluetoothd reports a 'Suspend' followed by 'Abort'

Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20 Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:155: write()=-1 errno=11 Resource temporarily unavailable retries=5 Nov 12 09:55:23 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:161: write()=-1 errno=EAGAIN Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/io.c:152: write() pfd.fd=20 Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:833: PCM drained Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:455: PCM close for 94:23:6E:5D:F1:78 type 1 stream 0 Nov 12 09:55:24 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:905: Releasing transport: A2DP Source (SBC) Nov 12 09:55:26 raspberrypi authpriv.notice sudo: pi : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/usr/bin/l2ping -c 1 -i hci0 94:23:6E:5D:F1:78 Nov 12 09:55:26 raspberrypi authpriv.info sudo: pam_unix(sudo:session): session opened for user root by pi(uid=0) Nov 12 09:55:30 raspberrypi daemon.err bluetoothd[643]: Suspend: Connection timed out (110) Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:933: Closing BT: 20 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1073: Exiting IO thread Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:1031: Closing PCM: 15 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/bluez.c:1058: Signal: PropertiesChanged: org.bluez.MediaTransport1: State Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:605: Client closed connection: 14 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/transport.c:735: State transition: 2 -> 0 Nov 12 09:55:30 raspberrypi user.debug /usr/bin/bluealsa: ../../src/ctl.c:675: +-+- Nov 12 09:55:32 raspberrypi daemon.err bluetoothd[643]: Abort: Connection timed out (110)

at this point now hci0 is reported down, and I cannot bring it up

root@raspberrypi:~/servers# hciconfig hci0: Type: BR/EDR Bus: UART BD Address: B8:27:EB:1D:FA:46 ACL MTU: 1021:8 SCO MTU: 64:1 DOWN RX bytes:20044 acl:219 sco:0 events:1959 errors:0 TX bytes:1799270 acl:3641 sco:0 commands:102 errors:0

root@raspberrypi:~/servers# hciconfig hci0 up Can't init device hci0: Connection timed out (110)

Tried unloading/reloading the bluetooth hardware driver?

Hmm, not sure how to do that, can you advise ?

eroom1966 commented 5 years ago

Final update I installed Debian 9/Stretch and this seems to have solved the problem, so I presume the issues were related to /usr/sbin/bluetoothd - having upgraded to 5.43, this now appears solid closing Many thanks