50ButtonsEach / fliclib-linux-hci

Flic SDK for Linux
306 stars 54 forks source link

Stop / Restart daemon? #43

Closed boonhapus closed 7 years ago

boonhapus commented 7 years ago

I run into a problem where my buttons will not respond after some time of inactivity. The only way to get them to begin responding again is to restart my unit. Bluetooth service is continue to run, but I assume the daemon hangs or something. With the option to restart or stop daemon without killing process, I can safely start/stop.

fabianbergmark commented 7 years ago

Please run btmon alongside the daemon and wait until it hangs, then send include the log here.

boonhapus commented 7 years ago

Silly question -- I've never used btmon before. How do I run it alongside?

Emill commented 7 years ago

Just open a second terminal and run "sudo btmon > logfile.txt". That starts logging. Press CTRL+C when you want to stop logging.

boonhapus commented 7 years ago

Roger that! Will do. I have only had this happen 2 times in the entirety of owning these buttons, this happens very infrequently. I will post my logfile once it happens again. To be continued ...

boonhapus commented 7 years ago

After coming home from work today, I re-encountered this, I ran btmon and redirected output to a logfile. I went around and pressed all 5 of my flic buttons, this was the only output.

Bluetooth monitor ver 5.23
= New Index: 5C:F3:70:77:FE:6F (BR/EDR,USB,hci0)                [hci0] 0.553371
* Unknown packet (code 8 len 0)                                 [hci0] 0.553380
* Unknown packet (code 10 len 8)                                [hci0] 0.553383
        6f fe 77 70 f3 5c 0f 00                          o.wp.\..        

Interestingly, in running ps aux | grep flicd, I find nothing running as any user. It looks as if the service has terminated itself? After restarting the service, the buttons still give the same feedback: when pressed the red LED will come on for a second or two. Rerunning btmon yields the same result as above. The only fix to this problem is to reboot the Pi - which is not the preferred solution.

http://pastebin.com/xDuM79L6 Here is a snippet of normal log output, after a fresh reboot. There is 1 button press towards the end of the logs, and I ran btmon for ~10 seconds (in this case) for demonstration.

Let me know if you have any questions!

Emill commented 7 years ago

What is the command line you use to start flicd and how do you do that?

boonhapus commented 7 years ago

/etc/init.d/flic_startup

#!/bin/sh

set -e

NAME=flic
DAEMON_OPTS=" -d -f flic.db"

start() {
    echo -n "Starting flic service: "
    /home/homeassistant/.homeassistant/fliclib-linux-hci/bin/armv6l/flicd -- $DAEMON_OPTS
    echo "."
}

stop() {
    echo -n "[TEMP] Stopping flic service: "
    echo "."
}

restart() {
    echo -n "[TEMP] Restarting flic service: "
    echo "."
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    restart)
        restart
        ;;
    *)
        echo "Usage: ... start | stop | restart"
        exit 1
        ;;
esac
exit 0

I run it on startup. Granted, I don't really know what I'm doing in the script above, but I have an understanding of what's happening and what the expected outcome is. I'm very comfortable in python, so reading code isn't impossible.

Thanks for the help.

boonhapus commented 7 years ago

Hi, this happens now at least once a day, and I'm not sure why the flic service just stops/ends the process.

fabianbergmark commented 7 years ago

Hi. Can you run the daemon with the --log-file option and then attach the log here after the process stops. This is probably due to some assertion failure in the flicd executable due to bugs in the bluetooth chip.

boonhapus commented 7 years ago

Hi, I got your message 3 days ago, restarted the service with logging enabled and as luck would have it, have yet to run into the occurrence. I will post my logfile here once it occurs again!

boonhapus commented 7 years ago

Hello all, I have found the error!

Available HCI devices found:
hci0

Trying hci0
hci0 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Entering daemon state...
Entered daemon state.
Initialization of Bluetooth controller done!
Accepted new client
HCI socket was closed (EPIPE): Broken pipe
Waiting for HCI device to become available again...
*** Error in `/home/homeassistant/.homeassistant/fliclib-linux-hci/bin/armv6l/flicd': double free or corruption (fasttop): 0x001a3118 ***

Would love your thoughts on how this error occurs.

boonhapus commented 7 years ago

It seems I have encountered a different error as well.. I have changed nothing in my setup.

...
Accepted new client
Client EOF
A client was disconnected
Received signal 15
Exited
boonhapus commented 7 years ago

@fabianbergmark @Emill Any thoughts?

fabianbergmark commented 7 years ago

@SupahNoob Does it happen often? I know @Emill have looked through the code and found no obvious errors.

boonhapus commented 7 years ago

@fabianbergmark This happens at least once every 30 hours. Sometimes it will be a few hours, sometimes it will be a day or so. But it happens at least once every day, day and a half.

Emill commented 7 years ago

Could you run flicd through gdb in order to catch the error? Run it by calling sudo gdb -ex run --args ./flicd -f flic.db. When it crashes, type backtrace and give me the output.

boonhapus commented 7 years ago

@Emill Thank you. I will post again once I've encountered a crash.

sudo gdb -ex run --args /home/homeassistant/.homeassistant/fliclib-linux-hci/bin/armv6l/flicd -f /home/homeassistant/.homeassistant/flic.db

Currently running with the above command -- not daemonized. If it crashes soon, I will try to also run with the daemon flag and see if that produces a different result.

boonhapus commented 7 years ago

Hi @Emill, here is my log+error ...

Available HCI devices found:
hci0

Trying hci0
hci0 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Initialization of Bluetooth controller done!
Accepted new client

Warning: Non-standard-compliant or buggy bluetooth controller, sent 2 complete packets event, but there were only 1 outstanding for conhdl 66
Warning: Non-standard-compliant or buggy bluetooth controller, sent 2 complete packets event, but there were only 1 outstanding for conhdl 66
Warning: Non-standard-compliant or buggy bluetooth controller, sent 2 complete packets event, but there were only 1 outstanding for conhdl 66
Warning: Non-standard-compliant or buggy bluetooth controller, sent 2 complete packets event, but there were only 1 outstanding for conhdl 66
HCI socket was closed (EPIPE): Broken pipe
Waiting for HCI device to become available again...
*** Error in `/home/homeassistant/.homeassistant/fliclib-linux-hci/bin/armv6l/flicd': double free or corruption (!prev): 0x000d00b8 ***

Program received signal SIGABRT, Aborted.
0x76cfcf70 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

and my backtrace.

#0  0x76cfcf70 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x76cfe324 in __GI_abort () at abort.c:89
#2  0x76d38954 in __libc_message (do_abort=<optimized out>,
    fmt=0x76dee6e8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x76d3eb80 in malloc_printerr (action=1,
    str=0x76dee81c "double free or corruption (!prev)", ptr=<optimized out>)
    at malloc.c:4996
#4  0x76d3fb24 in _int_free (av=<optimized out>, p=<optimized out>,
    have_lock=0) at malloc.c:3840
#5  0x00012a2c in ?? ()
#6  0x00012a98 in ?? ()
Cannot access memory at address 0x0
#7  0x0000e764 in ?? ()
Cannot access memory at address 0x0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Emill commented 7 years ago

Ok so I found the problem! If the Bluetooth controller is unplugged when there is a scan wizard active, that crash happens.

I'm a bit surprised that home-assistant's default option is to continuously scan for buttons and add every one that is discovered, since that means anyone could stand outside the window and connect to the home automation system...

What bluetooth dongle do you have? It does not seem that stable if the connection to it gets lost all the time (as seen by "HCI socket was closed").

boonhapus commented 7 years ago

@Emill

Ok so I found the problem! If the Bluetooth controller is unplugged when there is a scan wizard active, that crash happens.

That's great! There is an option for HA to continuously scan that can be turned on/off, ie "Discovery Mode". I could raise an issue in the repo and suggest there be a service call associated with turning on/off the scan wizard. From reading through your python code some month back, that should not be a too difficult to implement. For what it's worth, HA doesn't actually do anything with the button clicks. It simply registers the event on the event bus and broadcasts that knowledge to the platform. The user decides what the click "means".

What bluetooth dongle do you have? It does not seem that stable if the connection to it gets lost all the time (as seen by "HCI socket was closed").

Since I am on a RPi2, I'm actually using the dongle that seemed most attractive in the documentation for this repo.

We have tested compatibility with some common Bluetooth controllers. The following devices have been tested and confirmed: Plugable USB Bluetooth 4.0 Low Energy Micro Adapter / Asus USB-BT400 (Broadcom BCM20702 Bluetooth 4.0)

https://smile.amazon.com/gp/product/B009ZIILLI/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1

boonhapus commented 7 years ago

@Emill, @fabianbergmark

For what it's worth, after moving the server to a RPi3, my uptime now 52hrs and counting - no blips. I've also re-done my startup file in favor for a systemd service.

flic@homeassistant.service running on my homeassistant linx user account.

The flicd file is in /usr/local/bin/

[Unit]
Description=Flic Button Service
After=bluetooth.service
Before=home-assistant@homeassistant.service

[Service]
Type=simple
User=%i
ExecStart=/usr/local/bin/flicd -w -s 0.0.0.0 -p 5551 -l /home/homeassistant/.homeassistant/flic_log.txt -f /home/homeassistant/.homeassistant/flic.db
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

I'm happy with my solution. I would suggest either vetting the Plugable adapter again, or removing it from the list of tested & approved adapters.