Open trinitronx opened 5 years ago
I think next steps are to try a newer version of bluez
in case there are bugfixes. It seems like the bluepy-helper
just hangs and starts using large amounts of CPU doing the poll()
calls. A restart of the SystemD service including the hciconfig down / up
and power off / on
Post & Pre-Exec steps helps to recover to a working state.
Again, this doesn't happen initially for a while but only starts to happen after enough successive sensor reads. It seems there is a bit of chance or some condition that must be hit for the bug to appear.
For comparison, a bluepy-helper
strace
for a working run through the loop is here:
pi@raspberrypi:~/src/pub/waveplus-reader $ while true; do bluepy_pid=$(ps auxww | grep -i bluepy-helper | grep -v grep | awk '{ print $2 }'); [ -n "$bluepy_pid" ] && sudo strace -p $bluepy_pid || sleep 0.5; unset bluepy_pid; done
strace: Process 4715 attached
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=5, events=POLLNVAL}], 1, 0) = 0 (Timeout)
getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockopt(5, SOL_SOCKET, SO_DOMAIN, [31], [4]) = 0
getsockopt(5, SOL_SOCKET, SO_PROTOCOL, [0], [4]) = 0
getsockname(5, {sa_family=AF_BLUETOOTH, sa_data="\0\0pxf\353'\270\4\0\1\0"}, [14]) = 0
getsockopt(5, SOL_BLUETOOTH, 13, "\240\2", [2]) = 0
fstat64(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fcntl64(5, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
socket(AF_ALG, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 6
bind(6, {sa_family=AF_ALG, sa_data="skcipher\0\0\0\0\0\0\0\0\0\0\0\0\0\0ecb(aes)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 88) = 0
open("/dev/urandom", O_RDONLY) = 7
socket(AF_ALG, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 8
bind(8, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0cmac(aes)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 88) = -1 ENOENT (No such file or directory)
close(8) = 0
close(7) = 0
close(6) = 0
getsockopt(5, SOL_SOCKET, SO_DOMAIN, [31], [4]) = 0
getsockopt(5, SOL_SOCKET, SO_PROTOCOL, [0], [4]) = 0
getsockname(5, {sa_family=AF_BLUETOOTH, sa_data="\0\0pxf\353'\270\4\0\1\0"}, [14]) = 0
write(1, "rsp=$stat\36state=$conn\36dst='00:81"..., 61) = 61
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, -1) = 1 ([{fd=0, revents=POLLIN}])
read(0, "char 1 FFFF b42e2a68-ade7-11e4-8"..., 1024) = 49
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\1\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\7\2\0\2\3\0\0*\4\0\2\5\0\1*\6\0\2\7\0\4*", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\7\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\7\10\0\2\t\0\311*", 23) = 9
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\t\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25\f\0\2\r\0\272\\\367\223;\22\323\211\344\21\347\255h*.\264", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\r\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25\17\0,\20\0\272\\\367\223;\22\323\211\344\21\347\255\6-.\264", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\20\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25\23\0\20\24\0\272\\\367\223;\22\323\211\344\21\347\255\302/.\264", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\24\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25\30\0\34\31\0\0\0\0\0\0\0\0\260\0@Q\4\301\377\0\360", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\31\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25\34\0\34\35\0\0\0\0\0\0\0\0\260\0@Q\4\302\377\0\360", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10\35\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\25 \0\24!\0\0\0\0\0\0\0\0\260\0@Q\4\305\377\0\360", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10!\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\7%\0\2&\0#*'\0\2(\0$*)\0\2*\0%*", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\10*\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\t\7+\0\2,\0&*-\0\2.\0'*/\0\0020\0)*", 23) = 23
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\0100\0\377\377\3(", iov_len=7}], 1) = 7
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\1\0100\0\n", 23) = 5
getsockopt(5, SOL_BLUETOOTH, 4, "\1\0", [2]) = 0
write(1, "rsp=$find\36hnd=hC\36props=h2\36vhnd=h"..., 77) = 77
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, -1) = 1 ([{fd=0, revents=POLLIN}])
read(0, "rd D\n", 1024) = 5
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, -1) = 1 ([{fd=5, revents=POLLOUT}])
writev(5, [{iov_base="\n\r\0", iov_len=3}], 1) = 3
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}], 4, 30000) = 1 ([{fd=5, revents=POLLOUT}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, 30000) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\v\1a\"\0\f\0\0\0d\7*\243\10\2\f\0\0\0\337\7", 23) = 21
write(1, "rsp=$rd\36d=b016122000C00000064072"..., 52) = 52
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 4, -1) = 1 ([{fd=0, revents=POLLIN}])
read(0, "disc\n", 1024) = 5
close(5) = 0
close(5) = -1 EBADF (Bad file descriptor)
getpeername(2, 0x7ead38c4, [128]) = -1 ENOTSOCK (Socket operation on non-socket)
futex(0x76f1eccc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
ioctl(2, TCGETS, 0x7ead38bc) = -1 ENOTTY (Inappropriate ioctl for device)
getpid() = 4715
open("/usr/lib/arm-linux-gnueabihf/charset.alias", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/arm-linux-gnueabihf/gconv/gconv-modules.cache", O_RDONLY) = 5
fstat64(5, {st_mode=S_IFREG|0644, st_size=26262, ...}) = 0
mmap2(NULL, 26262, PROT_READ, MAP_SHARED, 5, 0) = 0x76f72000
close(5) = 0
futex(0x76e13ed8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
write(2, "\n(process:4715): GLib-WARNING **"..., 60) = 60
write(1, "rsp=$stat\36state=$disc\36mtu=h0\36sec"..., 38) = 38
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 1 ([{fd=5, revents=POLLNVAL}])
poll([{fd=0, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=0}], 4, -1) = 2 ([{fd=0, revents=POLLIN}, {fd=5, revents=POLLNVAL}])
read(0, "quit\n", 1024) = 5
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
futex(0x1533a54, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
Any updates on this issue @trinitronx ? I have been running into the same issue with multiple devices (AirThings stops connecting to the RPi). I am able to collect data for ~7 days at a 5-minute ping before losing connection with what looks like no hope to re-establish communication (has been two weeks since I have gotten any data).
Just thought I would check-in and see if you had re-visited this issue. I have some things I can check in the meantime:
I believe the issue described in #4 is still happening as of latest
Raspbian GNU/Linux 9 (stretch)
+ bluetooth stack components:Based on my experiences and experimentation, it appears that the issue is in the underlying code powering this script either
bluepy
,bluez
, broadcom drivers, or some combination? Some searching online shows many others are having similar Bluetooth stack instability on Raspbian.I've been trying to get this code running as a long-term solution for reading and sending metrics to Graphite. I should note that so far I've been prevented from running the script as-is on Python 2.7 because I was unable to install
bluepy
viapip
package. However, I was able to get the Python3 version from piwheels so I could move forward just converting all theprint
statements in this script to use parenthesesprint()
.After testing out the script on the host, I found that I had to add kernel capabilities to the
bluepy_helper
binary. This binary is installed as part ofbluepy
and must have been why the Python 2.7 version frompip
could not install a wheel or compile on the Raspberry Pi.After running the script for a while, I've seen various ways in which it fails or eventually hangs. This always seems to happen after some varying period of time. I've tried also adding a SystemD unit which is supposed to restart the process and reset the
hci0
device, yet eventually each time the code hangs. I'm collecting the various exceptions in logs to see how to handle them.SystemD Unit:
Error examples &
strace
debugging:I tried getting an
strace
after finding thatbluepy-helper
had hung: