50ButtonsEach / fliclib-linux-hci

Flic SDK for Linux
307 stars 53 forks source link

flicd hanging after pairing attempt. #117

Open yetzt opened 1 month ago

yetzt commented 1 month ago

I have original Flic buttons (from 2016, not version two), they work fine with the Flic app on MacOS.

I am running the x86_64 version of flicd using an rtl8761b based USB bluetooth adapter. I'm starting flicd flicd -f flicd.sqlite -s 0.0.0.0. I connect with the node client. The button is found, but after attempted pairing, i receive a timeout.

> node newscanwizard.js
Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Found public button 80:e4:da:71:33:63 (F018cTNj). Now connecting...
Completed with result: WizardFailedTimeout

On subsequent tries no button is found until i restart flicd.

According to btmon, there apears to be some sort of endless loop until flicd is stopped.

[...]
> HCI Event: LE Meta Event (0x3e) plen 43                 #152 [hci0] 13.885227
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: 80:E4:DA:71:33:63 (OUI 80-E4-DA)
        Data length: 31
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (f02adfc0-26e7-11e4-9edc-0002a5d5c51b)
        Name (complete): f018cTNj
        RSSI: -79 dBm (0xb1)
> HCI Event: LE Meta Event (0x3e) plen 43                 #153 [hci0] 13.932553
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: 80:E4:DA:71:33:63 (OUI 80-E4-DA)
        Data length: 31
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (f02adfc0-26e7-11e4-9edc-0002a5d5c51b)
        Name (complete): f018cTNj
        RSSI: -83 dBm (0xad)
> HCI Event: LE Meta Event (0x3e) plen 40                 #154 [hci0] 13.991403
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: D4:AD:FC:BB:38:F9 (OUI D4-AD-FC)
        Data length: 28
        Name (complete): GBK_H6125_38F9
        Company: not assigned (34818)
          Data: ec000a0101
        Flags: 0x05
          LE Limited Discoverable Mode
          BR/EDR Not Supported
        RSSI: -69 dBm (0xbb)
> HCI Event: LE Meta Event (0x3e) plen 43                 #155 [hci0] 14.122604
      LE Advertising Report (0x02)
        Num reports: 1
        Event type: Connectable undirected - ADV_IND (0x00)
        Address type: Public (0x00)
        Address: 80:E4:DA:71:33:63 (OUI 80-E4-DA)
        Data length: 31
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        128-bit Service UUIDs (complete): 1 entry
          Vendor specific (f02adfc0-26e7-11e4-9edc-0002a5d5c51b)
        Name (complete): F018cTNj
        RSSI: -77 dBm (0xb3)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2    #156 [hci0] 14.128023
        Scanning: Disabled (0x00)
        Filter duplicates: Disabled (0x00)
> HCI Event: Command Complete (0x0e) plen 4               #157 [hci0] 14.131068
      LE Set Scan Enable (0x08|0x000c) ncmd 2
        Status: Success (0x00)
< HCI Command: LE Add Device To W.. (0x08|0x0011) plen 7  #158 [hci0] 14.131229
        Address type: Public (0x00)
        Address: 80:E4:DA:71:33:63 (OUI 80-E4-DA)
> HCI Event: Command Complete (0x0e) plen 4               #159 [hci0] 14.134188
      LE Add Device To White List (0x08|0x0011) ncmd 2
        Status: Success (0x00)
< HCI Command: LE Create Connect.. (0x08|0x000d) plen 25  #160 [hci0] 14.134331
        Scan interval: 60.000 msec (0x0060)
        Scan window: 30.000 msec (0x0030)
        Filter policy: White list is used (0x01)
        Peer address type: Public (0x00)
        Peer address: 00:00:00:00:00:00 (OUI 00-00-00)
        Own address type: Public (0x00)
        Min connection interval: 20.00 msec (0x0010)
        Max connection interval: 25.00 msec (0x0014)
        Connection latency: 0 (0x0000)
        Supervision timeout: 200 msec (0x0014)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4                 #161 [hci0] 14.138329
      LE Create Connection (0x08|0x000d) ncmd 2
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 31                 #162 [hci0] 14.289313
      LE Enhanced Connection Complete (0x0a)
        Status: Success (0x00)
        Handle: 16
        Role: Master (0x00)
        Peer address type: Public (0x00)
        Peer address: 80:E4:DA:71:33:63 (OUI 80-E4-DA)
        Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable)
        Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable)
        Connection interval: 22.50 msec (0x0012)
        Connection latency: 0 (0x0000)
        Supervision timeout: 200 msec (0x0014)
        Master clock accuracy: 0x00
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #163 [hci0] 44.138827
> HCI Event: Disconnect Complete (0x05) plen 4            #164 [hci0] 44.240127
        Status: Success (0x00)
        Handle: 16
        Reason: Connection Timeout (0x08)
> HCI Event: Command Complete (0x0e) plen 4               #165 [hci0] 44.245202
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #166 [hci0] 44.245418
> HCI Event: Command Complete (0x0e) plen 4               #167 [hci0] 44.248357
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #168 [hci0] 44.248488
> HCI Event: Command Complete (0x0e) plen 4               #169 [hci0] 44.251496
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #170 [hci0] 44.251610
> HCI Event: Command Complete (0x0e) plen 4               #171 [hci0] 44.254721
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #172 [hci0] 44.254869
> HCI Event: Command Complete (0x0e) plen 4               #173 [hci0] 44.257992
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #174 [hci0] 44.258140
> HCI Event: Command Complete (0x0e) plen 4               #175 [hci0] 44.261061
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #176 [hci0] 44.261167
> HCI Event: Command Complete (0x0e) plen 4               #177 [hci0] 44.264238
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #178 [hci0] 44.264378
> HCI Event: Command Complete (0x0e) plen 4               #179 [hci0] 44.267434
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #180 [hci0] 44.267551
> HCI Event: Command Complete (0x0e) plen 4               #181 [hci0] 44.270606
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #182 [hci0] 44.270721
> HCI Event: Command Complete (0x0e) plen 4               #183 [hci0] 44.273785
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #184 [hci0] 44.273900
[...]
< HCI Command: LE Create Connecti.. (0x08|0x000e) plen 0  #548 [hci0] 44.841832
> HCI Event: Command Complete (0x0e) plen 4               #549 [hci0] 44.844832
      LE Create Connection Cancel (0x08|0x000e) ncmd 2
        Status: Command Disallowed (0x0c)
< HCI Command: Reset (0x03|0x0003) plen 0                 #550 [hci0] 44.844891
> HCI Event: Command Complete (0x0e) plen 4               #551 [hci0] 44.856154
      Reset (0x03|0x0003) ncmd 3
        Status: Success (0x00)
@ USER Close: flicd                                   {0x0002} [hci0] 44.856316
= Close Index: 8A:88:4B:C0:DA:D5                               [hci0] 44.856327

I've attached the btmon log in text and btsnoop format:

btmonlog.txt btmonlog.btsnoop

I hope this issue can be resolved.

I have to express my frustration that the flicd server component is proprietary closed source and offers no way to debug anything. This is stark contrast of the "hackable" advertisement that lead me to buy this product back in 2015. I have waited almost a decade now to use these buttons for anything but making fart noises.

Emill commented 1 month ago

Seems like it sends the wrong event LE Enhanced Connection Complete while it is configured to send LE Connection Complete

Can you please post the output of btmon so that includes when you start flicd until the error occurs?

yetzt commented 1 month ago

i did, in the file i attached: btmonlog.txt.

Emill commented 1 month ago

@yetzt nah the log starts with a start scan + an advertisement event, which means that flicd had already been started earlier at the time btmon logging was started.

Emill commented 1 month ago

Can you also see if there is a new firmware/driver for the Bluetooth chip and if that fixes the problem?

yetzt commented 1 month ago

i upgraded my vm to the latest debian and ensured all packages are current and the firmware binary blobs are the ones from the kernel git. i added a second bluetooth dongle just in case.

dmesg | grep -i bluetooth

[    2.036620] usb 3-1: Product: Bluetooth 5.3 Radio
[    2.355002] usb 3-2: Product: Bluetooth 5.4 Radio
[    4.474521] Bluetooth: Core ver 2.22
[    4.474584] NET: Registered PF_BLUETOOTH protocol family
[    4.474585] Bluetooth: HCI device and connection manager initialized
[    4.474956] Bluetooth: HCI socket layer initialized
[    4.474963] Bluetooth: L2CAP socket layer initialized
[    4.474975] Bluetooth: SCO socket layer initialized
[    4.541117] Bluetooth: hci0: RTL: examining hci_ver=0a hci_rev=000b lmp_ver=0a lmp_subver=8761
[    4.541182] Bluetooth: hci1: RTL: examining hci_ver=0a hci_rev=000b lmp_ver=0a lmp_subver=8761
[    4.544596] Bluetooth: hci0: RTL: rom_version status=0 version=1
[    4.544604] Bluetooth: hci0: RTL: loading rtl_bt/rtl8761bu_fw.bin
[    4.545414] Bluetooth: hci1: RTL: rom_version status=0 version=1
[    4.545420] Bluetooth: hci1: RTL: loading rtl_bt/rtl8761bu_fw.bin
[    4.559800] bluetooth hci0: firmware: direct-loading firmware rtl_bt/rtl8761bu_fw.bin
[    4.559869] Bluetooth: hci0: RTL: loading rtl_bt/rtl8761bu_config.bin
[    4.559988] Bluetooth: hci1: RTL: loading rtl_bt/rtl8761bu_config.bin
[    4.560075] bluetooth hci0: firmware: direct-loading firmware rtl_bt/rtl8761bu_config.bin
[    4.560129] Bluetooth: hci0: RTL: cfg_sz 6, total sz 27814
[    4.560204] Bluetooth: hci1: RTL: cfg_sz 6, total sz 27814
[    5.039005] Bluetooth: hci1: RTL: fw version 0x09a98a6b
[    5.042294] Bluetooth: hci0: RTL: fw version 0x09a98a6b
[    5.263225] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[    5.263231] Bluetooth: BNEP filters: protocol multicast
[    5.263237] Bluetooth: BNEP socket layer initialized
[    5.265195] Bluetooth: MGMT ver 1.22
[    5.299762] Bluetooth: MGMT ver 1.22

next i started btmon btmon -r flic.log so the log includes the complete run of flicd.

then i started flicd on the first bluetooth controller: ./flicd -f flicd.sqlite -s 0.0.0.0 -h hci0 full output:

Available HCI devices found:
hci1
hci0

Trying hci0
hci0 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Initialization of Bluetooth controller done!
Accepted new client
^CReceived signal 2
A client was disconnected
Exited

then i tried to pair my button via node newscanwizard.js, full output:

Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Found public button 80:e4:da:71:33:63 (F018cTNj). Now connecting...
Completed with result: WizardFailedTimeout

after that i stopped flicd and tried on the second controller ./flicd -f flicd.sqlite -s 0.0.0.0 -h hci1, full output:

# ./flicd -f flicd.sqlite -s 0.0.0.0 -h hci1
Available HCI devices found:
hci1
hci0

Trying hci1
hci1 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Initialization of Bluetooth controller done!
Accepted new client
Client EOF
A client was disconnected
Accepted new client
Client EOF
A client was disconnected
^CReceived signal 2
Exited

after that i tried again to pair the button with node newscanwizard.js, this time undertaking a second attempt after the first timeout. full output:

Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Found public button 80:e4:da:71:33:63 (F018cTNj). Now connecting...
Completed with result: WizardFailedTimeout
$ node newscanwizard.js 
Welcome to the add new button wizard. Press and hold down your Flic button to add it.
^C

again, the first attempt ended with WizardFailedTimeout and with the second attempt nothing happened after pressing the button.

after that i stopped flicd again and ended btmon as well. i converted the logfile from btsnoop format to text btmon -r flic.log -P > flic.txt.log these are attached here:

flic.log flic.txt.log

Emill commented 1 month ago

Can you please contact the manufacturer of the Bluetooth dongle and file an issue? It is the exact same issue as observed in one of Espressif's chips: https://github.com/espressif/esp-idf/issues/6698.

Emill commented 1 month ago

In the meantime, please test this version with a workaround and let me know if it works.

flicd-le-event-mask-workaround.tar.gz

yetzt commented 1 month ago

i would give it a try, but the glibc version on debian bookworm is behind the one this was compiled with

./flicd-le-event-mask-workaround: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found (required by ./flicd-le-event-mask-workaround)
# ldd --version
ldd (Debian GLIBC 2.36-9+deb12u8) 2.36

i will be trying to run it in a chrooted environment

Emill commented 1 month ago

Can you try Ubuntu 24.04 or newer?

yetzt commented 1 month ago

i will try and report back.

yetzt commented 1 month ago

this did not work with the rtl8761b receiver:

output of newscanwizard.js:

Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Found public button 80:e4:da:71:33:63 (F018cTNj). Now connecting...
Button connected. Now verifying and pairing...
Completed with result: WizardFailedTimeout

output of flicd-le-event-mask-workaround:

Available HCI devices found:
hci0

Trying hci0
hci0 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Initialization of Bluetooth controller done!
Accepted new client
Client EOF
A client was disconnected

btmon logs: flicd.log flicd.txt.log

however it worked with the rtl8761bu receiver, although there was also a failure at first with the second button:

output of newscanwizard.js:

$ node newscanwizard.js 
Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Found public button 80:e4:da:71:33:63 (F018cTNj). Now connecting...
Button connected. Now verifying and pairing...
Completed with result: WizardSuccess
Your new button is 80:e4:da:71:33:63
$ node newscanwizard.js 
Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Completed with result: WizardFailedTimeout
$ node newscanwizard.js 
Welcome to the add new button wizard. Press and hold down your Flic button to add it.
Your button is private. Hold down for 7 seconds to make it public.
Found public button 80:e4:da:71:64:57 (F018cWRX). Now connecting...
Button connected. Now verifying and pairing...
Completed with result: WizardSuccess
Your new button is 80:e4:da:71:64:57

output of flicd-le-event-mask-workaround:

# ./flicd-le-event-mask-workaround -s 0.0.0.0 -f flicd.sqlite
Available HCI devices found:
hci0

Trying hci0
hci0 is busy, shutting down and retrying...
Successfully bound HCI socket
Flic server is now up and running!
Initialization of Bluetooth controller done!
Accepted new client
Client EOF
A client was disconnected
Accepted new client
Client EOF
A client was disconnected
Accepted new client
Client EOF
A client was disconnected
^CReceived signal 2
Exited

btmon logs: flicd2.log flicd2.txt.log

yetzt commented 1 month ago

after some struggle i got four out of six buttons to pair, with the fifth button not pairing after 10 attempts and the sixth button not being seen at all. this might be a limitation of the bluetooth chip though. the other buttons work fine with the flic home assistant integration.

thanks for bearing with me and investing your time to help me out. i would appreciate an official build including the workaround that i can run on my existing vm.

Emill commented 1 month ago

In the first log, if you open it in Wireshark and inspect packet number 290-293, you can see that it very much appears as the LE Connection Complete Event sent by the controller as packet 293 has been delayed until it was "poked" with packet 292. It is very unlikely that it connects immediately when we initiate a timeout based cancellation (in packet 292). Additionally, packet 301 tells us that the button initiated a disconnection, which in its firmware does 15 seconds after connection was established in case it does not receive anything from the central. Since there should thus be at least 15 seconds between the LE Connection Complete event and this event, it is clear that the packets from the Bluetooth controller is getting stuck somewhere. Possibly in the USB driver of the computer or, if you use a VM, somewhere between the VM and the host USB stack. Or in the Linux driver for the USB Bluetooth dongle.

For the second log, it appears simply that you didn't press and held the button for 7 seconds so that it started advertising in Public mode, as indicated by the first upper case F rather than lower case f in its advertised name. At least from the info in the log.

If there is a maximum connection limit for the Bluetooth chip that you are reaching, you should detect this with the "info" command, see https://github.com/50ButtonsEach/fliclib-linux-hci/blob/master/ProtocolDocumentation.md#evtgetinforesponse. If you try to pair using the simpleclient program, you can issue the "info" command after and see if it indeed has reached the maximum number of connections. Let me know in that case what the maximum number is. (I'm curious)

yetzt commented 1 month ago

I will investigate this further when i find the time.