custom-components / ble_monitor

BLE monitor for passive BLE sensors
https://community.home-assistant.io/t/passive-ble-monitor-integration/
MIT License
1.91k stars 247 forks source link

Error doing job: Fatal error: protocol.data_received() call failed #433

Closed LEJOUI closed 2 years ago

LEJOUI commented 3 years ago

Updated to 4.0 and it broke my sensors. I have 3 LYWSD03MMC with custom ATC1441 firmware. In version 3.5 they still work. See logs below. Am I missing some breaking change?

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:87 
First occurred: 18:34:47 (2 occurrences) 
Last logged: 18:42:18

Error doing job: Fatal error: protocol.data_received() call failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1840, in data_received
    extra_data = ev.decode(packet)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1303, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1350, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 833, in decode
    data = field.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1379, in decode
    data = ad.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1646, in decode
    val.decode(data[: length.val - len(type)])
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 664, in decode
    data = mynbyte.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 623, in decode
    self.val = unpack(">%ds" % self.length, data[: self.length])[0][::-1]
struct.error: unpack requires a buffer of 4 bytes
LEJOUI commented 3 years ago
Deze fout is ontstaan door een aangepaste integratie.

Logger: homeassistant
Source: custom_components/ble_monitor/__init__.py:515 
Integration: Passive BLE monitor (documentation, issues) 
First occurred: 18:32:00 (1 occurrences) 
Last logged: 18:32:00

Error doing job: Fatal error: protocol.data_received() call failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1852, in data_received
    self.process(packet)
  File "/config/custom_components/ble_monitor/__init__.py", line 515, in process_hci_events
    sensor_msg, tracker_msg = ble_parser(self, data)
TypeError: cannot unpack non-iterable NoneType object
Ernst79 commented 3 years ago

Could you try to change the line 29 and 37 in /ble_monitor/ble_parser/__init__.py

from

return None

to

return None, None

LEJOUI commented 3 years ago

Hmm tried that and the sensors are working again. I'm still having some appearances of the first log in my logbook though.

LEJOUI commented 3 years ago

Maybe slightly different though

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:87 
First occurred: 12 juli 2021 19:56:20 (2 occurrences) 
Last logged: 12 juli 2021 23:54:50

Error doing job: Fatal error: protocol.data_received() call failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1840, in data_received
    extra_data = ev.decode(packet)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1303, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1350, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 833, in decode
    data = field.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1375, in decode
    data = x.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 191, in decode
    self.val = unpack(">B", data[:1])[0]
struct.error: unpack requires a buffer of 1 bytes
Ernst79 commented 3 years ago

Fixed the TypeError: cannot unpack non-iterable NoneType object Error in 4.0.1.

About the other error, it looks like something is going wrong in the aioblescan package. @Magalex2x14 Do you have any idea what can be wrong?

Magalex2x14 commented 3 years ago

No, I have no idea... I am fairly superficial with aioblescan. We need to understand why such a situation arises.

@LEJOUI what's your host configuration and which BT adapter are you using? Is this an Intel NUC?

Magalex2x14 commented 3 years ago

@Ernst79 Let's see what LEJOUI has to say about the hardware. I suspect the issue may be related to BT5.0 extended advertisements.

In any case, I would look at what is in data at the time of the error (I would use the try except construct, catch the TypeError and log the contents of data). To do this, you need to modify aioblescan and import the modified local version ... My capabilities are now limited - I'm far from my development environment ...

LEJOUI commented 3 years ago

No, this isn’t an Intel NUC but a mini-pc. Brand unknown for me. Bought it off a marketplace. It has an internal bluetooth radio onboard. This is what HA shows about it. I’m running HAOS in a VM on Proxmox.

BUSNUM: '002'
DEVNAME: /dev/bus/usb/002/002
DEVNUM: '002'
DEVPATH: /devices/pci0000:00/0000:00:1e.0/0000:01:1b.0/usb2/2-1
DEVTYPE: usb_device
DRIVER: usb
ID_BUS: usb
ID_MODEL: Bluetooth_Radio
ID_MODEL_ENC: Bluetooth\x20Radio\x20
ID_MODEL_ID: '3410'
ID_PATH: pci-0000:01:1b.0-usb-0:1
ID_PATH_TAG: pci-0000_01_1b_0-usb-0_1
ID_REVISION: '0200'
ID_SERIAL: Realtek_Bluetooth_Radio_00e04c000001
ID_SERIAL_SHORT: 00e04c000001
ID_USB_INTERFACES: ':e00101:'
ID_VENDOR: Realtek
ID_VENDOR_ENC: Realtek\x20
ID_VENDOR_ID: 13d3
MAJOR: '189'
MINOR: '129'
PRODUCT: 13d3/3410/200
SUBSYSTEM: usb
TYPE: 224/1/1
USEC_INITIALIZED: '6065833'
Ernst79 commented 3 years ago

I've prepared a special version to do some debugging.

LEJOUI commented 3 years ago

Not getting a long data string though. Is it necessary to activate the debug mode for the component in HA as well??

2021-07-14 16:44:46 ERROR (Thread-5) [custom_components.ble_monitor.aioblescan_ext] Error in decoding NBytes: 9b0fbc
2021-07-14 16:44:46 ERROR (Thread-5) [custom_components.ble_monitor.aioblescan_ext] Error in decoding EnumByte: 
2021-07-14 16:44:46 ERROR (Thread-5) [custom_components.ble_monitor.aioblescan_ext] Error in decoding EnumByte: 
2021-07-14 16:44:46 ERROR (Thread-5) [homeassistant] Error doing job: Fatal error: protocol.data_received() call failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 1848, in data_received
    extra_data = ev.decode(packet)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 1311, in decode
    data = ev.decode(data)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 1358, in decode
    data = ev.decode(data)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 841, in decode
    data = field.decode(data)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 1383, in decode
    data = x.decode(data)
  File "/config/custom_components/ble_monitor/aioblescan_ext.py", line 325, in decode
    self.val = unpack(">B", data[:1])[0]
Ernst79 commented 3 years ago

You did get one in the first message, Error in decoding NBytes: 9b0fbc

The last error is a new one, but let me first look into this first one. Most likely it is all caused by the same issue.

Some first thoughts (no conclusion yet, just to help myself understand the problem)

The first error occurs in NBytes, more specific in the decode part

class NBytes:
    """Class representing a byte string.

        :param name: The name of the instance
        :type name: str
        :param length: The length
        :type length: int
        :returns: NBytes instance.
        :rtype: NBytes

    """

    def __init__(self, name, length=2):
        self.name = name
        self.length = length
        self.val = b""

    def encode(self):
        return pack(">%ds" % len(self.length), self.val)

    def decode(self, data):
        try:
            self.val = unpack(">%ds" % self.length, data[: self.length])[0][::-1]
        except:
            _LOGGER.error("Error in decoding NBytes: %s", data.hex())
        return data[self.length :]

the only part of aioblescan that is using this part of the code is here

class Adv_Data(Packet):
    def __init__(self, name, length):
        self.name = name
        self.length = length
        self.payload = []

    def decode(self, data):
        myinfo = NBytes("Service Data uuid", self.length)
        data = myinfo.decode(data)
        ....

One thing that looks strange is the fact that the data is 3 bytes, while the default is 2 bytes.

Looking for Adv_Data also only shows, lengths of 2, 4 and 16, not 3

        elif type.val == 0x16:
            val = Adv_Data("Advertised Data", 2)
        elif type.val == 0x1F:
            val = NBytes_List("Service Solicitation uuid", 4)
        elif type.val == 0x20:
            val = Adv_Data("Advertised Data", 4)
        elif type.val == 0x21:
            val = Adv_Data("Advertised Data", 16)

These are the 3 service_id formats, which can't be 3 bytes long.

Ernst79 commented 3 years ago

It is failing immediately? Could you make a hcidump with some advertisements for a short while (few minutes). You can make a hcidump with the following command

sudo hcidump --raw hci > dump.txt
LEJOUI commented 3 years ago

Thanks for your troubleshooting so far! For this week I don’t have any time left to do this, next week I’ll give a dump. I’m not seeing this error all the time, but occasionally. It occurred 2 times the past 6 hours. If I remember this correct it happens a few minutes after a HA restart

Ernst79 commented 3 years ago

ah, ok, that makes it more difficult to catch the exact message that caused the error. I will think of a better solution to catch this data message.

LEJOUI commented 3 years ago

It is failing immediately? Could you make a hcidump with some advertisements for a short while (few minutes). You can make a hcidump with the following command

sudo hcidump --raw hci > dump.txt

I’m having issues running this command since hcidump can’t be installed on hassos (it’s also missing apt). Any pointer on how to achieve this?

Ernst79 commented 3 years ago

HassOs is difficult to get this running. There is a way, but it is quite complicated.

It's easier for you to add some code to get the error messages.

I've created an new aioblescan_ext.py, which should catch the full packet/data. Could you copy this file over the content of the current aioblescan_ext.py file and restart HA? After that, wait till it fails and check your log again.

LEJOUI commented 3 years ago

Thanks. I don’t have this file in the ble component. Should I create a new one or can this file be found in the packages of HA and should I docker in?

Ernst79 commented 3 years ago

No, just follow these instructions from my earlier post (https://github.com/custom-components/ble_monitor/issues/433#issuecomment-879900738) (I wrongly assumed you still had this code).

I've prepared a special version to do some debugging.

  • Please make sure you have 4.0.1 first.
  • Next, copy the content of the following file over the content of /config/custom_components/ble_monitor/__init__.py on your mini-pc.
  • In the same folder, create a file aioblescan_ext.py (so /config/custom_components/ble_monitor/aioblescan_ext.py) and copy the entire content of this file in it.
  • After that, restart your HA and wait till the error occurs.
  • An error should be in the HA log with a long data string. We need this error message to figure out what is wrong. It probably needs to be fixed in aioblescan, but we can take care of that later.
LEJOUI commented 3 years ago
Deze fout is ontstaan door een aangepaste integratie.

Logger: custom_components.ble_monitor.aioblescan_ext
Source: custom_components/ble_monitor/aioblescan_ext.py:1851 
Integration: Passive BLE monitor (documentation, issues) 
First occurred: 13:56:57 (1 occurrences) 
Last logged: 13:56:57

failing data: 043e4e02020300ae87bb59af011918ff4c000713010e2071aa37320010006164ff000000000000ae0001efa3f561b3761f02011a1bff4c000c0e00afe8e025efb044dbb3995e35da10065b1e5c501877bc
Ernst79 commented 3 years ago

Bedankt,

I'm going to puzzle on it to see if I can find out what is wrong. Will take some time, but I'll come back to you on it a.s.a.p.

LEJOUI commented 3 years ago

I’m having some more errors though. Not sure if they’re helpful.

Deze fout is ontstaan door een aangepaste integratie.

Logger: custom_components.ble_monitor.aioblescan_ext
Source: custom_components/ble_monitor/aioblescan_ext.py:1851 
Integration: Passive BLE monitor (documentation, issues) 
First occurred: 13:56:57 (4 occurrences) 
Last logged: 20:58:21

failing data: 043e4e02020300ae87bb59af011918ff4c000713010e2071aa37320010006164ff000000000000ae0001efa3f561b3761f02011a1bff4c000c0e00afe8e025efb044dbb3995e35da10065b1e5c501877bc
failing data: 043e4602020000eef0875da17c1109ff89004591624583320302fefe020104a80400eef0875da17c1f0309474e1aff89005504001f0111001a0d11064ad365128aac94fc68448160a8
failing data: 043e3c020202015b698bbad1471c03039ffe17169ffe0000000000000000000000000000000000000000bc04015b698bbad1470a09ffe000b213ca8d9b0fba
failing data: 043e2702020001962fda3146681102011a020a0c0aff4c0010050318f88918ba0401962fda31466800ba
Ernst79 commented 3 years ago

Thanks, I'll look into it. It will take some days, as the problem is in aioblescan, which is an external pypi package we use. This is not developed by ourselves, but I can try to have a look to find out what is wrong.

Ernst79 commented 3 years ago

I looked at the last advertisement, which seems to be a not-extended format. There is something strange at the end. Normally, after the mac, you would have something like the total remaining length excl RSSI (in this case 11 in hex = 17 bytes). In the payload, this is split up in subparts with length, type and value:

02 01 1a (length 2, type 01, value 0a) 02 0a 0c (length 2, type 0a, value 0c) 0a ff 4c 00 10 05 03 18 f8 89 18, length 10, type ff, value the remaining 9 bytes Next, you would get the RSSI value (ba). So far, so good.

But after that, there are additional bytes 04 01 96 2f da 31 46 68 00 ba

But I can't make sense of it, length isn't 4, I do see the same RSSI at the end (ba). But this does not make sense to me. I expect that aioblescan is failing on this part as is continues to decode data. I noticed that aioblescan is decoding empty data at the end.

--------------------------------------------------------------------------------------------------------------------------------------------------------------
HCI  Evt Len Sub Num Evt  Peer -------MAC-------   Len  Len Type Val    Len Type Val   Len Type -------Val---------------- RSSI--?????????????????????????????  
type code    evt rep type addr                              flag            flag           flag
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 04   3e 27  02  02   00   01  96 2f da 31 46 68   11   02   01  1a     02   0a  0c     0a  ff  4c 00 10 05 03 18 f8 89 18   ba   04 01 96 2f da 31 46 68 00 ba
Ernst79 commented 3 years ago

I think we will have to open an issue on the aioblescan repository, I have no idea why these advertisements have this additional bytes, which seem tp be in the wrong format.

Ernst79 commented 3 years ago

@Magalex2x14 I think a possible solution would be to create a try except, but I'm a bit uncertain about the right place in ble_monitor.

I was thinking that the right place would be the run function in the HCIdump class, after line 560 in __init__.py, but I'm not convinced that this is the correct location and/or the right way to do it.

            try:
                self._event_loop.run_forever()
            except struct.error as error:
                _LOGGER.error("Skipping advertisement due to structure error: %s", error)
            finally:
                _LOGGER.debug("HCIdump thread: main event_loop stopped, finishing")
                for hci in self._interfaces:

Do you think this will work?

Magalex2x14 commented 3 years ago

Hmmm... I doubt it will work this way, since the error actually occurs inside the event loop itself (self._event_loop.run_forever() just starts it, all the work is done inside aioblescan), and this point is not in the backtrace... Even if it would work, then we will skip the rest of the period for all interfaces, which looks wrong...

Google suggests that in our case, it can be useful to reassign the exception handler for event_loop: loop.set_exception_handler(handler) - docs Inside the handler, you can look at a specific error and, in addition to logging, take some other action. This article has an example of using the handler (def handle_exception(loop, context)).

I have no experience of using exception handlers with asyncio, I cannot give more details. I think it's worth just giving it a try, starting with simple logging in the handler. We just use the original unmodified aioblescan, and try to reassign the handler before starting the event loop (before run_forever). Perhaps this will also help us with errors like Event loop stopped before future completed, but I'm not sure...

Magalex2x14 commented 3 years ago

Perhaps it would be more correct to add the handler immediately after creating the event loop:

            if self._event_loop is None:
                self._event_loop = asyncio.new_event_loop()
                asyncio.set_event_loop(self._event_loop)
               ### maybe here?  vvv
               self._event_loop.set_exception_handler(handler)

Then, if I understand everything correctly, with this handler we will catch all errors that arise even at the stage of connecting to hci-interfaces ...

Ernst79 commented 3 years ago

I’ll have to study this, it’s way out of my expertise.I’ll do some reading

Ernst79 commented 2 years ago

There has been an update in aioblescan (0.2.9), which is added in 5.8.0-beta. This might fix the issue.

LEJOUI commented 2 years ago

Thanks for the update. Unfortunately the issue still persists in the latest beta.

aletzi1 commented 2 years ago

I also face this bug. Is there anything i can do to help, like providing some logs or something ?

lululombard commented 2 years ago

Still an issue when I use any of my two USB Bluetooth dongles.

aletzi1 commented 2 years ago

With 2021.12.8, HACS 1.19.2 and BLE Monitor 6.9.0 beta (also with 6.8.0)

Error doing job: Fatal error: protocol.data_received() call failed. Traceback (most recent call last): File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_readydata_received self._protocol.data_received(data) File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1852, in data_received self.process(packet) File "/config/custom_components/ble_monitor/init__.py", line 563, in process_hci_events sensor_msg, tracker_msg = self.ble_parser.parse_data(data) File "/config/custom_components/ble_monitor/ble_parser/init.py", line 202, in parse_data sensor_data = parse_oral_b(self, man_spec_data, mac, rssi) File "/config/custom_components/ble_monitor/ble_parser/oral_b.py", line 76, in parse_oral_b "sector": SECTORS[sector], KeyError: 39

Ernst79 commented 2 years ago

Fixed the KeyError in 6.9.1-beta. Next time, please open a new issue, as this is a completely different error.

Ernst79 commented 2 years ago

Good news (hopefully).

There has been an update of aioblescan, which probably fixes the issue from @LEJOUI, @lululombard and @aletzi1. I have added the new aioblescan to 7.0.0, hopefully this will solve it. Let me know

lululombard commented 2 years ago

@Ernst79 Thanks for the update, I just updated my setup but unfortunately, I still don't get any values when I disable the internal RPi Bluetooth card and only tick the USB one. I don't see any errors in the logs This is not an issue for now, but I plan to move to a bigger server with no internal Bluetooth so I'm hoping to find a solution. I ordered an older Bluetooth 4.0 USB adapter, because right now I'm using 5.0 and 5.1 adapters and both don't return any errors, but my Bluetooth thermometers are just shown as "unavailable". Maybe my issue is no longer related to this thread? Thanks

Ernst79 commented 2 years ago

I think you have a different issue yes. If you want to figure out what is wrong, please open a new issue, preferably with some hcidump.

LEJOUI commented 2 years ago

Once again thanks for the update. There’s still a similar error in my logs.

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:98 
First occurred: 13 januari 2022 23:54:43 (2 occurrences) 
Last logged: 05:00:48

Error doing job: Fatal error: protocol.data_received() call failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 870, in _read_ready__data_received
    self._protocol.data_received(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1841, in data_received
    extra_data = ev.decode(packet)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1303, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1350, in decode
    data = ev.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 833, in decode
    data = field.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 1375, in decode
    data = x.decode(data)
  File "/usr/local/lib/python3.9/site-packages/aioblescan/aioblescan.py", line 191, in decode
    self.val = unpack(">B", data[:1])[0]
struct.error: unpack requires a buffer of 1 bytes
Ernst79 commented 2 years ago

@LEJOUI I have asked to have another look at your failing data again by the dev of aioblescan, see https://github.com/frawau/aioblescan/issues/44. Last time his conclusion was that it was a hardware related issue. But I asked him, if he still things this is hardware related, as the fix in aioblescan was applied to so called "extended BLE advertisements", while your error is in a normal BLE advertisement. But I don't see something wrong in that part of the code.

Ernst79 commented 2 years ago

There seems to be a fix on its way. I have tested the fix with some of the failing data you have given me in the past, and this seem to pass, with the fix. We have to wait till aioblescan is updated, after that we can hopefully fix your issue.

see https://github.com/frawau/aioblescan/issues/44 for more details

Ernst79 commented 2 years ago

@LEJOUI 7.0.1 has the fixed aioblescan, can you give it a try (again)?

LEJOUI commented 2 years ago

Yes! Updated last night and I this error seems to be gone now. Thanks!

Ernst79 commented 2 years ago

Excellent. Took a while, but finally solved!