abandonware / node-bluetooth-hci-socket

MIT License
42 stars 47 forks source link

0.5.3-8 issues with bleno #31

Open alecfriedman3 opened 2 years ago

alecfriedman3 commented 2 years ago

Hi, not sure where to file this issue but this seems like the best place. We're using bleno in an iot project at my company and noticed a problem connecting to our devices specifically when using v0.5.3-8 of this project. Devices will connect and pair, but when trying to actually communicate the connection doesn't respond. In a chrome browser (where we're connecting to the gatt server from) we get an error that says No Services matching UUID <uuid> found when trying to retrieve the primary service by uuid. This problem isn't exhibited when using earlier versions of this project (0.5.3-7 works as expected)

This commit seems like it could be a source of the problem https://github.com/abandonware/node-bluetooth-hci-socket/commit/3ebb846fcacfd94a36fbd842ac9b7e0fabe60ae7 (by process of elimination looking at other commits between the two latest releases)

We have a simple nodejs bleno project that exhibits the issue when switching between versions if you need to see the problem; it's running on debian buster, node 12. We don't have the resources to find an actual fix here and are going to lock our versions on the previous ones, but thought someone should be made aware of what seems like a bug here

rzr commented 2 years ago

hi thx for feedback, may @splitice help you can propose a revert change at worst case.

Zwimber commented 2 years ago

Same issue here, after updating new problems have arrissen.

splitice commented 2 years ago

Perhaps someone who understands bleno can work out the behavioural difference?

Ive never used bleno but the change has a huge positive impact on noble (devices actually can be connected to reliably).

I'm not the original author of the patch thats @sandeepmistry . I only fixed an observed issue, tested that it resolved our issues and presented it for PR.

Unfortunately I'm not sure documentation on the workaround from the original hci socket even exists?

The patch does multiple things perhaps if you have a repeatable failure work out which part of it causes issue (e.g the re-ordering of the workaround, the proc writes etc). perhaps the proc control settings isnt suitable for an end device ?

rzr commented 2 years ago

you can propose a revert fix until a better solution is found

splitice commented 2 years ago

Honestly I think someone needs to nut this out on the bleno side. There is no documentation on either the original workaround or the new. It could well be that neither is completely working and one works for noble and the other bleno. We won't know why until people who use bleno troubleshoot I think.

As I said I have no experience with bleno but this fixes noble.

splitice commented 2 years ago

We had our team meeting today, if someone on the bleno side can work out which element of the patch causes problems (or useful information) we will integrate and test in our present product iteration.

The more information that can be provided the better. I'll also drop any notes from in-advance research in this issue.

We are going to take a look at both work arounds and see what we can figure out regarding it. Our interest lies with noble but am happy to improve components for the good of the community (within the resources available).

splitice commented 2 years ago

My first guess regarding bleno issues is that the connect/disconnect workaround did not apply originally for this use case (i.e on connect) but now does (or visa versa). This can be tested by putting a printf in the connect (if block) and seeing if for each version the block is triggered.

My second guess would be that bleno requires the l2cap socket to be maintained throught the connection (instead of just used to force a flush as was originally intended). This can be tested by removing the close().

splitice commented 2 years ago

Third guess

data[7] == 0x01 is if connecting to a slave device (i.e no workaround for bleno devices) this was introduced in the imported patch and worth removing for testing.

goofiw commented 2 years ago

This might be an edge case. But I have bleno working on my mac, but am unable to discover services when running on a raspberry pi 4 in a docker container (pi is on ubuntu, docker is using buster). Devices do scan, and Noble is working fine.

Apollon77 commented 2 years ago

Which MacOS version? Apple broke discovery with macos 12.3 ... could be fixed by 12.3.1

splitice commented 2 years ago

I am currently looking into (what I suspect) is issues when connections are terminated by supervisor timeout with one device (noble not bleno) currently. I don't (yet) know why.

Honestly this work around is really bad. But fixing it would likely mean at-least API change.

In a nutshell the work around ensures that there is a Linux socket created for each connection (preventing gc of connection).

There however exists races with that cleanup. And multiple connection creation / disconnection methods with some being racey.

splitice commented 2 years ago

My efforts are limited to Linux fyi. I can't speak to the Mac os hci stack.

splitice commented 1 year ago

FYI we replaced this module with a forked version of node-raw-socket to include Bluetooth support.

It's limited to HCI_USER_CHANNEL, but honestly thats alot better implemented in noble anyway (at-least for Linux).

https://github.com/HalleyAssist/node-raw-socket

cwilling commented 9 months ago

On a Pi Zero W, I still see the originally stated problem - chrome based external client sees the bleno advertisement and connects but fails to recognize any services. However changing to bluetooth-hci-socket v0.5.3-7 makes no difference compared with the latest v0.5.3-10.

I wonder if the changes proposed so far are limited in their wider usefulness because all the testing seems to be against noble, which itself is related to bleno somehow isn't it? Doesn't this run the danger of ending up in a closed loop where bleno only works with noble?

I've been using Google provided sample code from here to show available services. There are several useful samples there - in this case I've been using the Discover Services & Characteristics sample. For an ESP32 based peripheral that I coded (using the NimBLE Arduino library), the google discovery sample finds all the peripheral's services.

However the discovery sample finds no services when I run any of the @abandonware/bleno examples. For instance with the pizza example, the discovery sample's Live Output shows (while browsing devices):

Requesting any Bluetooth Device...

Then, after browsing and pairing with the PizzaSquat that was discovered:

Connecting to GATT Server...
Getting Services...
Argh! NotFoundError: No Services found in device.

Sorry I'm not offering an actual solution. Just a suggestion to test against against something authoritative (assuming the Google sample code fits that description).

cwilling commented 9 months ago

I tried running the Google discovery sample on two other (slightly newer) machines, as well as on a phone, and it worked in all cases. Subsequently I found the bleno based app I'd started coding was also correctly discovered, as were its services. With that small victory, on one of the working machines I reset my app to use latest @abadonware/bleno 0.6.1 (I'd downgraded to 0.5.1-4) and latest @abandonware/bluetooth-hci-socket 0.5.3-10 and everything was still discovered correctly.

Conclusion, something about the first machine I'd started coding on wasn't working correctly - nothing to do with @abandonware/{bleno,bluetooth-hci-socket}.

I'm happy to have solved my problem, although I'm a bit surprised since the versions of possibly relevant software on the original (faulty) machine are not very different than on the machines that work. The older, main development machine has:

bluez-5.64-x86_64-1
bluez-firmware-1.2-x86_64-4
kernel-generic-5.15.118-x86_64-1
kernel-firmware-20230725_b6ea35f-noarch-1

whereas the newer machines have:

bluez-5.70-x86_64-1
bluez-firmware-1.2-x86_64-4
kernel-generic-6.1.59-x86_64-1
kernel-firmware-20231019_d983107-noarch-1

Looks like I'll have to upgrade my main machine.