Home-Is-Where-You-Hang-Your-Hack / sensor.goveetemp_bt_hci

Govee Temperature/Humidity BLE Home Assistant Component
MIT License
161 stars 29 forks source link

Data collection stops after invalid literal for int() with base 16: #7

Closed sfjes closed 4 years ago

sfjes commented 4 years ago

I've been having a few cases (average once per day) where data collection stops, with an error like the below in the logs (pardon the line numbers differing from the latest master, I'm still running the pre-multi-update-fix refactoring version).

I'm guessing these might be related to corrupted data in some of the BLE broadcasts, but I haven't had a chance to look more closely.

2020-02-20 16:10:45 DEBUG (SyncWorker_0) [custom_components.goveetemp_bt_hci.sensor] NEW DATA: {'rssi': -78, 'mac': 'MAC', 'temperature': 20.3492, 'humidity': 49.2, 'battery': 100.0, 'packet': 203492}
2020-02-20 16:10:45 ERROR (MainThread) [homeassistant.core] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/homeassistant/custom_components/goveetemp_bt_hci/sensor.py", line 551, in update_ble
    discover_ble_devices(config)
  File "/var/homeassistant/custom_components/goveetemp_bt_hci/sensor.py", line 370, in discover_ble_devices
    data = parse_raw_message_gvh5075(msg)
  File "/var/homeassistant/custom_components/goveetemp_bt_hci/sensor.py", line 186, in parse_raw_message_gvh5075
    govee_encoded_data = int(data[80:86], 16)
ValueError: invalid literal for int() with base 16: ''

I don't have the trace handy, but I've also see the same ValueError, but on the line battery = int(data[86:88], 16)

Also, once:

2020-02-22 17:20:51 DEBUG (SyncWorker_2) [custom_components.goveetemp_bt_hci.sensor] reading hcidump...
2020-02-22 17:20:51 DEBUG (SyncWorker_2) [custom_components.goveetemp_bt_hci.sensor] NEW DATA: {'rssi': -84, 'mac': 'MAC', 'temperature': 65.5363, 'humidity': 36.3, 'battery': 3.0, 'packet': 655363}
2020-02-22 17:20:51 ERROR (SyncWorker_2) [custom_components.goveetemp_bt_hci.sensor] Temperature spike: 65.5363 (MAC)
2020-02-22 17:20:51 ERROR (MainThread) [homeassistant.core] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/homeassistant/custom_components/goveetemp_bt_hci/sensor.py", line 551, in update_ble
    discover_ble_devices(config)
  File "/var/homeassistant/custom_components/goveetemp_bt_hci/sensor.py", line 451, in discover_ble_devices
    if m_temp:
UnboundLocalError: local variable 'm_temp' referenced before assignment

In all these cases, the component still goes through its per-period cycle, but no longer logs additional packet messages.

2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] update_ble called
2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] Discovering Bluetooth LE devices
2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] Stopping
2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] Analyzing
2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] {'MAC': 'Govee 1'}
2020-02-26 15:24:29 DEBUG (SyncWorker_4) [custom_components.goveetemp_bt_hci.sensor] reading hcidump...
Thrilleratplay commented 4 years ago

Interesting. I have not see this before. I will add checks to prevent invalid data from crashing the plugin. I see that the battery is at 3%, I wonder if it sends out another signal when the battery is low that could be confusing the parser.

sfjes commented 4 years ago

In that case both the temperature (149F) and battery (3%) readings are bunk -- hence my suspicion of corrupted packets being received. Not sure about humidity, 36.3% is plausible. The packets in the minute before showed 100% battery.

Thrilleratplay commented 4 years ago

@sfjes I pushed a branch called add_h5075_parse_checks. When you have the time, could you test it and see if it corrected the issue?

sfjes commented 4 years ago

Thanks @Thrilleratplay. I've switched to that branch. Given that the error occurs somewhere between 0.25 - 1.0/times a day, it may take a few days to confirm if it resolves it. Will let you know.

sfjes commented 4 years ago

The component hasn't died since applying this branch. I added some debug messages to help see if the parse checks were catching anything, and they did, once. I expected to see more catches based on the previous rate of crashes, but hey, maybe less solar flares this week.

I also saw bad data a few times (100C type stuff), which got averaged out into a small blip.

Thrilleratplay commented 4 years ago

Where the 100C spikes being recorded in the data or just seeing the spikes in the error logs? The default max temp is 60c, it shouldn't record anything higher unless that was changed in the config.

sfjes commented 4 years ago

I lost track of where I wrote down the exact log message (and HA has since truncated the log), so it may well have been <60C. I just remember thinking that it was an absurdly high reading given that I'd be pleased to hit 20C.

Thrilleratplay commented 4 years ago

If you notice it again and it is logged in the sensor data, try lowering the upper temperature threshold. I am still curious how the signal is being altered enough to have bad data but still have a valid mac and device name. Possibly a faulty unit? shrugs I am going to merge this branch in if it at very least prevents crashing with bad data.