Integration instability and log errors

bigfeetneedbigboots commented 3 years ago

DESCRIPTION Now getting failures in logs and entities and sensors are becoming unavailable after using this integration successfully and largely without issue since about August 2020. The integration was installed via HACS and added to the HA integrations per instructions.

SCREENSHOTS

SYSTEMS

Device: HomeKit-Enabled Heatmiser neoHub Gen 2 link
Home Assistant Version: core-2021.3.4 / supervisor-2021.03.6
Hardware: NUC 10
Home Assistant Installation: Home Assistant Operating System (HassOS)
NUC and neoHub both connected to Unify network via CAT6 ethernet

LOGS 1.

Logger: homeassistant.components.climate
Source: custom_components/heatmiserneo/climate.py:171
Integration: Climate (documentation, issues)
First occurred: 12:39:38 (1 occurrences)
Last logged: 12:39:38

heatmiserneo: Error on device update!
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 360, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 465, in async_device_update
    await task
  File "/config/custom_components/heatmiserneo/climate.py", line 171, in async_update
    _, devices = await self._hub.get_live_data()
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 370, in get_live_data
    hub_data = await self._send(message)
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 84, in _send
    raise(last_exception)
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 57, in _send
    data = await asyncio.wait_for(
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

2.

Logger: neohub
Source: /usr/local/lib/python3.8/site-packages/neohubapi/neohub.py:73
First occurred: 12:39:38 (166 occurrences)
Last logged: 13:54:43

[1] Timed out while sending a message to 192.168.1.32

3.

Logger: homeassistant.helpers.entity
Source: custom_components/heatmiserneo/climate.py:171
First occurred: 12:40:44 (165 occurrences)
Last logged: 13:54:43

Update for climate.mas fails
Update for climate.off fails
Update for climate.men fails
Update for climate.lau fails
Update for climate.the fails
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 277, in async_update_ha_state
    await self.async_device_update()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 473, in async_device_update
    raise exc
  File "/config/custom_components/heatmiserneo/climate.py", line 171, in async_update
    _, devices = await self._hub.get_live_data()
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 370, in get_live_data
    hub_data = await self._send(message)
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 84, in _send
    raise(last_exception)
  File "/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py", line 57, in _send
    data = await asyncio.wait_for(
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

CONTEXT This integration has worked very well for me previously and I enjoy using it as it enables me to turn on the heat when there is ample solar power to run it and switch it back off when the solar drops down. This effectively allows me to heat my home very economically and I enjoy having the temperature sensor readings for each room.

I've been noticing problems in the log files and this integration's climate sensors going unavailable for the last few weeks. Searching the interwebs hasn't got me any closer to solving the issue.

Today I completely removed the integrations (HA and HACS) and re-added them in the hope it would solve the issue. It didn't. Log entries represent the past 1-1.5 hours since re-adding the integration. The integration added back into HA without issue and the neoHub connected very quickly (i.e. < 1 sec).

The network reports as fine and traffic seems normal. Unify rates the "client experience" as 100 (out of 100) so I don't see an issue there.

There are 15 circuits connected to my system however the integration only added 14 back in today. The circuit known as "XAR" is missing. It previously was there (I have it in Node-RED but it is now "could not find state with entity_id "climate.xar"").

I tested from the Heatmiser iOS app to my system today. I was able to turn a circuit on and off and it responded quickly to both commands.

I waited until a circuit came up in the logs as failed (i.e. "Update for climate.spa fails") then I went to that entity within HA and turned it on. The circuit came on and within a few seconds was reflected in the iOS app as on. I then switched it off and it updated in the iOS app accordingly. Interestingly, after running this test, the failure entry in the logs for that circuit disappeared. As such, you wont see "Update for climate.spa fails" in the log entries above but it was there until I ran this test.

Not sure what else to tell you. Please let me know if further info required or if I can help with testing.

richhalliwell commented 3 years ago

Same thing happened to me. Seems to have got very on stable which may be an update that heat miser made to their API. I'm going to try rolling back to a commits before this repository was updated to use the API. I think that seemed more stable

PhillyGilly commented 3 years ago

Similar problem here since upgrading. I have a Gen 1 hub with 17 heatmisers (9 neostats, 3 neostat-e, and & 5 neostat-hw). Under version Heatmiser 18c0096 they were all visible and functioning in Home Assistant with my yaml work rounds. Now I have only six visible, five of which don't work! The HA Core Log gives the usual warnings before telling me that it is adding switches and then going into an endless update loop. 2021-03-29 18:20:12 WARNING (MainThread) [homeassistant.loader] You are using a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-03-29 18:20:12 WARNING (MainThread) [homeassistant.loader] You are using a custom integration heatmiserneo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-03-29 18:20:12 WARNING (MainThread) [homeassistant.loader] No 'version' key in the manifest file for custom integration 'heatmiserneo'. This will not be allowed in a future version of Home Assistant. Please report this to the maintainer of 'heatmiserneo' 2021-03-29 18:20:22 INFO (MainThread) [custom_components.heatmiserneo] Adding Switches: [<Entity Towel Rail 1: off>, <Entity Hot Water 1: off>, <Entity Towel Rail 3: off>, <Entity Towel Rail 2: off>, <Entity Hot Water 2: off>, , ] 2021-03-29 18:20:22 DEBUG (MainThread) [custom_components.heatmiserneo] Entered update(self) 2021-03-29 18:20:22 DEBUG (MainThread) [custom_components.heatmiserneo] Entered update(self) 2021-03-29 18:20:22 DEBUG (MainThread) [custom_components.heatmiserneo] Entered update(self)

I am hoping that I can roll-back now.

PhillyGilly commented 3 years ago

I just attempted a "roll-back" by over-writing the contents of my config\custom-components\heatmiserneo directory with the files that I saved form the same directory with the previous version. After restarting HA, I get nearly a mirror image of the above situation. All the devices that weren't visible, have reappeared but the three towel rails and one hot water that were there are missing now.

vhornacek commented 3 years ago

Hi,

I had the same issue and found that the problem is not in the HA component but in the underlying library neohubapi. The timeout is too short to receive all responses within the limit. This quick fix worked for me:

Edit

/usr/local/lib/python3.8/site-packages/neohubapi/neohub.py

Search for request_timeout=

And extend it. E.g.:

request_timeout=15

Restart HA.

bigfeetneedbigboots commented 3 years ago

Thanks @vhornacek, I think I have your 'quick fix' applied and working.

Steps for HassOS:

Install/open 'SSH & Web Terminal' add-on (the one by Frenck)
docker container exec -it homeassistant /bin/bash
cd /usr/local/lib/python3.8/site-packages/neohubapi
vi neohub.py
Move cursor to location where it says request_timeout= (mine said request_timeout=5)
i (for insert)
Change value (I changed mine to request_timeout=15)
: (for menu)
wq (write file and quit)
exit (exit container)
exit (exit SSH)
Restart home assistant

I've been running for about 15 minutes now and no errors in logs. Thank you!

I can't help but think that this is a temporary workaround that will be undone when HA is upgraded. Hopefully they put in a permanent fix i.e. set the timeout to a larger number or make it a persistent configurable item. In any case, this gets me (and others) out of trouble for now.

vhornacek commented 3 years ago

Yes, this is a quick fix and will be overwritten. The proper fix should be implemented across all NeoHub instantiations across the custom component i.e. replace default NeoHub timeout to a configurable variable.

They recently "Switch to using neohubapi.", which introduced this issue.

PhillyGilly commented 3 years ago

I have raised an issue on @stikonas neohupapi site for this http://gitlab.com/neohubapi/neohubapi/-/issues/7

PhillyGilly commented 3 years ago

Thanks @vhornacek, I think I have your 'quick fix' applied and working.

Steps for HassOS:

Install/open 'SSH & Web Terminal' add-on (the one by Frenck)

docker container exec -it homeassistant /bin/bash

cd /usr/local/lib/python3.8/site-packages/neohubapi

vi neohub.py

Move cursor to location where it says request_timeout= (mine said request_timeout=5)

i (for insert)

Change value (I changed mine to request_timeout=15)

: (for menu)

wq (write file and quit)

exit (exit container)

exit (exit SSH)

Restart home assistant

I've been running for about 15 minutes now and no errors in logs. Thank you!

I can't help but think that this is a temporary workaround that will be undone when HA is upgraded. Hopefully they put in a permanent fix i.e. set the timeout to a larger number or make it a persistent configurable item. In any case, this gets me (and others) out of trouble for now.

I'm running on RPi3b (boots on SD card but actually runs off SSD disk) and I'm struggling with file structure. Running Frenck's tool: cd .. find . -name neo (that is "star-neo-star") locates about ten files including the heatmiserneo directory but doesn't find the local copy of neohub.py

PhillyGilly commented 3 years ago

Good news . The default request_timeout will be changed to 60s. :-) See https://gitlab.com/neohubapi/neohubapi/-/issues/7 Let's see how we get on?

vhornacek commented 3 years ago

That's great, this would be a permanent fix. Thanks @PhillyGilly for the follow up!

PhillyGilly commented 3 years ago

Happy days! After uninstalling-powercycle-reinstalling-power cycle, all my Heatmisers are showing. Big thanks to Dave on neohubapi and to @stikonas too. Now I need to work on monitoring my repeaters .........

stikonas commented 3 years ago

I've now released neohubapi 0.7 to pypi.

bigfeetneedbigboots commented 3 years ago

I've now released neohubapi 0.7 to pypi.

For the uninitiated, does this mean that neohubapi is now at a version which fixed the issue and we are now waiting for pypi to be upgraded into this integration?

stikonas commented 3 years ago

I've now released neohubapi 0.7 to pypi.

For the uninitiated, does this mean that neohubapi is now at a version which fixed the issue and we are now waiting for pypi to be upgraded into this integration?

You don't need to wait. Removing deps folder from your home-assistant config would force neohubapi upgrade. (Note that you might need to restart HA twice beforce neohubapi is picked up). (And new installations will always install latest)

We can also bump minimal version of neohubapi in this integration. That will also force upgrades.

stikonas commented 3 years ago

@bigfeetneedbigboots Have you upgraded? Is stability better now?

bigfeetneedbigboots commented 3 years ago

Hi @stikonas, sorry for the slow response. Yes, I have just upgraded via HACS and I put the temporary workaround in about a month ago. I have not had any problems in the past month and everything looks OK still after the HACES upgrade a couple of minutes ago. Thanks to everyone for their help.

MindrustUK / Heatmiser-for-home-assistant

Integration instability and log errors #69