gazoodle / gecko-home-assistant

Home Assistant integration for spas equipped with Gecko Alliance in.touch2 modules
MIT License
69 stars 24 forks source link

Keeps disconnecting #49

Closed hijamie32 closed 2 years ago

hijamie32 commented 2 years ago

Hi

Since this last update the integration keeps disconnecting and only reconnects once you press the reconnect button of the integration.

Never had this problem before and not sure if it’s something to do with my setup or a wider issue.

Many thanks for any help

Jamie

gazoodle commented 2 years ago

Can you take a look at the logs, there ought to be some reports from geckolib on why it is disconnecting?

These latest releases have attempted to be much more in tune with what the spa connection state is and its response to pings etc. If it is regularly struggling with this for whatever reason, it could result in the behaviour you are experiencing. It could be something normal not being handled correctly, or it could be something with your particular setup.

Either way, the logs should help us to decide what we can do to fix this.

RhinoRich commented 2 years ago

I have the same issue. After a period of some hours it doesn't communicate. Since installing 5 days ago it has failed to change setpoint overnight and is showing incorrect sensor values in the morning. This has required re-starting HA in my case. The log is huge. I could email it? It's too big to post practically, but there are no obvious error lines.

RhinoRich commented 2 years ago

`Logger: geckolib.driver.protocol.unhandled Source: /usr/local/lib/python3.9/site-packages/geckolib/driver/protocol/unhandled.py:37 First occurred: 26 March 2022, 09:40:03 (1953 occurrences) Last logged: 20:35:06

No handler for b'STATP\x01\x01s\x0e\x00' from ('192.168.1.30', 10022, b'SPA04:91:62:d3:2b:26', b'IOSeff95fbb-912d-4997-a05f-382ccc2262ca') found, message ignored No handler for b'STATP\x01\x01s\x0f\x00' from ('192.168.1.30', 10022, b'SPA04:91:62:d3:2b:26', b'IOSeff95fbb-912d-4997-a05f-382ccc2262ca') found, message ignored No handler for b'STATP\x05\x01s\x0f\x00\x01\x04\x84\x00\x01\x18\x04\x00\x01\x04\x84\x00\x01\x18\x04\x00' from ('192.168.1.30', 10022, b'SPA04:91:62:d3:2b:26', b'IOSeff95fbb-912d-4997-a05f-382ccc2262ca') found, message ignored No handler for b'STATP\x05\x01\x03\n\x84\x01\x05\x04\x05\x01/\x14\x00\x017\x03\x84\x018\x84\x00' from ('192.168.1.30', 10022, b'SPA04:91:62:d3:2b:26', b'IOSeff95fbb-912d-4997-a05f-382ccc2262ca') found, message ignored No handler for b'STATP\x07\x01\x03\n\x84\x01\x05\x05\x05\x01/\x14\x00\x017\x03\x84\x018\x84\x00\x01\x7f\x02\x00\x01\x05\x05\x05' from ('192.168.1.30', 10022, b'SPA04:91:62:d3:2b:26', b'IOSeff95fbb-912d-4997-a05f-382ccc2262ca') found, message ignored`

gazoodle commented 2 years ago

Thanks for the info @RhinoRich, that reminded me that when I changed the UUID that the new client uses (it’s now unique for each integration) that the EN in.touch2 module was initially confused because it had a different ID on the same IP address. In my environment if I powered it down and then restarted it it seemed to fix the issue. It’s the module connected to your LAN. Pull out the USB power, wait for a minute to be sure it has really shut down and then plug it back in again.

RhinoRich commented 2 years ago

Thanks Gazoodle. I'll try that, but wouldn't that result in a problem always rather than it developing after a period of hours?

hijamie32 commented 2 years ago

Hey,

Sorry didn't get any notifications come through and just seen your replies.

It appears to have settled down after re-adding the integration and leaving it unplugged for a few hours and then plugging back in.

I also have an automation that presses the button every few hours but I have now disabled it as don't think it is required any longer.

I will update in a few days to let you know

Cheers for your help.

Jamie

RhinoRich commented 2 years ago

Thanks for the info @RhinoRich, that reminded me that when I changed the UUID that the new client uses (it’s now unique for each integration) that the EN in.touch2 module was initially confused because it had a different ID on the same IP address. In my environment if I powered it down and then restarted it it seemed to fix the issue. It’s the module connected to your LAN. Pull out the USB power, wait for a minute to be sure it has really shut down and then plug it back in again.

I tried this but still with issue. It also seems to be defaulting to a setpoint of 37.5 prior to becoming disconnected. This happens each time. 37.5 never features in my automations. It goes to 40 at 0100, 36 at 0430, 38.5 at 1600 and 36 again at 2000.

I will try removing the component and reinstalling it.

RhinoRich commented 2 years ago

Sadly that did not work. image

RhinoRich commented 2 years ago

image

gazoodle commented 2 years ago

Thanks for that info. I’ll take a look and perhaps see if the library can be more tolerant of these issues. Fingers crossed I can take a look in the next couple of days.

Gaili commented 2 years ago

Hi, thank you for checking, since the last update I have the same problem.

jone9618 commented 2 years ago

FWIW I’m having the same problem too.

llamakarma commented 2 years ago

Hi,

First thanks very much for building this integration in the first place - I'm on a time of use electricity plan and its a god send to be able to turn the tub down a few degrees during the peak periods.

I was on an old release (0.1.1 I think) and upgraded to 0.1.5 and hit the issue described here as well as high CPU on HA described elsewhere.

Other than not remaining connected and therefore missing my scheduled temp changes (or performing the temp down but not turning it back up), I have also found that sometimes the tub waterfall seems to have been turned on and left on for no apparent reason. I'm not sure if this is an artefact of upgrading from 0.1.1 (i.e. maybe a clean install would help).

Unfortunately in the rush to fix the schedule / consumption, I have rolled back without grabbing the logs.

I'll take a look if there are any older logs hiding in HA and send if found

gazoodle commented 2 years ago

Please can you try v0.1.6

llamakarma commented 2 years ago

Clarification of my last post - I think I was probably originally running an earlier version the 0.1.1 as the downgrade to 0.1.1 didn't solve the issues for me.

As requested, I've moved to 0.1.6 using the following steps:

-Delete from Config -> Devices and Services -Remove from HACS -> Integrations -Restart HA -HACS -> Integrations -> find Gecko and install -Restart HA -Config -> Devices and Services - Add Gecko (Pool) (let it search, did not provide an IP) -Config -> Devices and Services - Add Gecko (Hot tub) (let it search, did not provide an IP)

Note: I retained my original scenes - they seem to work manually at least.

HA CPU usage is averaging at around 24% - compared to 2% before the upgrades.

UI was showing the waterfalls enabled - since the iOS Gecko app is mostly useless it is hard for me to tell whether this was triggered by HA/Gecko or was already in effect. I tried to toggle them off but it didn't initially seem to work (toggles popped back to green).

Here are some recent logs - the waterfall does seem to be showing an error:

No handler for b"STATV\t\n'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored 1:15:58 PM – (WARNING) /usr/local/lib/python3.9/site-packages/geckolib/driver/protocol/unhandled.py - message first occurred at 1:12:12 PM and shows up 48 times

Exception handling setting UdWaterfall=HI 1:12:48 PM – (ERROR) /usr/local/lib/python3.9/site-packages/geckolib/automation/pump.py - message first occurred at 1:12:05 PM and shows up 2 times

I'll monitor the behavior over the next 24hrs to see if the schedule is working.

Hope this is useful!

llamakarma commented 2 years ago

A little more ...

Clarifying this comment: "HA CPU usage is averaging at around 24% - compared to 2% before the upgrades." I mean I was seeing around 2% util on whatever the very first version of Gecko I used was. 0.1.1, 0.1.5 and 0.1.6 all have high CPU.

I just tried using my scene to lower the temp of both pools - this seems to work but the waterfalls are turned on (it does not seem like they need to be - could this be configured?)

I also see this log message pop up again (+2 times, presumably once for each pool):

' Exception handling setting UdWaterfall=HI 1:27:29 PM – (ERROR) /usr/local/lib/python3.9/site-packages/geckolib/automation/pump.py - message first occurred at 1:12:05 PM and shows up 4 times ' More verbose: ' Logger: geckolib.automation.pump Source: /usr/local/lib/python3.9/site-packages/geckolib/automation/pump.py:58 First occurred: 1:12:05 PM (4 occurrences) Last logged: 1:27:29 PM

Exception handling setting UdWaterfall=HI Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/geckolib/automation/pump.py", line 54, in async_set_mode await self.facade.spa.accessors[ File "/usr/local/lib/python3.9/site-packages/geckolib/driver/accessor.py", line 187, in async_set_value newvalue = self.items.index(newvalue) ValueError: 'HI' is not in list '

Turning the temp back up does not result in any more waterfall errors - it seems like when turning the temp up, the plugin doe not want to mess with the waterfalls..

Manually toggling the waterfalls to off results in:

No new log messages.

For completeness, the full text of the No Handler error message from before is here:

' Logger: geckolib.driver.protocol.unhandled Source: /usr/local/lib/python3.9/site-packages/geckolib/driver/protocol/unhandled.py:37 First occurred: 1:12:12 PM (48 occurrences) Last logged: 1:15:58 PM

No handler for b"STATV\t\n'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored No handler for b"STATV\n\x0b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored No handler for b"STATV\x0b\x0c'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored No handler for b'STATV\x0c\x00\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored No handler for b'WCGET\x01' from ('192.168.10.3', 10022, b'SPA04:91:62:d3:b7:b6', b'IOS65414933-2779-4d7a-b74a-20db4fa364de') found, message ignored '

HTH

RhinoRich commented 2 years ago

I have the same result. image

llamakarma commented 2 years ago

A little more follow up:

Accompanies by some more logs related to connection failure:

` Unable to find referenced entities fan.hot_tub_pump_1 4:00:43 PM – (WARNING) helpers/service.py - message first occurred at 4:00:19 PM and shows up 9 times

Logger: homeassistant.helpers.service
Source: helpers/service.py:130
First occurred: 4:00:19 PM (9 occurrences)
Last logged: 4:00:43 PM • Unable to find referenced entities fan.hot_tub_pump_1 • Unable to find referenced entities fan.hot_tub_waterfall • Unable to find referenced entities fan.hot_tub_pump_2 • Unable to find referenced entities climate.hot_tub_heater • Unable to find referenced entities light.hot_tub_lights

Exception during spa connection 3:58:33 PM – (ERROR) /usr/local/lib/python3.9/site-packages/geckolib/async_spa.py

Logger: geckolib.async_spa
Source: /usr/local/lib/python3.9/site-packages/geckolib/async_spa.py:368
First occurred: 3:58:33 PM (1 occurrences)
Last logged: 3:58:33 PM Exception during spa connection Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/geckolib/async_spa.py", line 366, in connect await self._connect() File "/usr/local/lib/python3.9/site-packages/geckolib/async_spa.py", line 333, in _connect if not await self.struct.get( File "/usr/local/lib/python3.9/site-packages/geckolib/driver/async_spastruct.py", line 63, in get request = create_func() File "/usr/local/lib/python3.9/site-packages/geckolib/async_spa.py", line 336, in self._protocol.get_and_increment_sequence_counter(), AttributeError: 'NoneType' object has no attribute 'get_and_increment_sequence_counter' `

So it looks like the connection state machine is getting in a knot somewhere.

I'm still not sure if the auto-waterfall feature is intentional or a logic bug - it would be great to remove this since I'm turning the pools up/down to load shift my power draw and turning on pumps kind of defeats that.

Thanks

RhinoRich commented 2 years ago

It does appear to be sending and receiving from the In.Touch2, but many errors: image The main log is huge but I can send it if you want

gazoodle commented 2 years ago

Thanks for the excellent feedback. There seems to be two issues here, one the connection or lack of, and one where waterfall/circ pump is confused. I’m going to split the issues up as I don’t think they are related.

The connection stuff has sort of been around in various forms for a while, but the async releases of the library are less tolerant of this. Thinking hat needed here …

llamakarma commented 2 years ago

Thanks for looking into it - concurrency is not for the faint of heart!

On the waterfall topic - at least in my pools, the waterfall/circ seems to be activated automatically by the control pack when required. So I don't think there is any need to configure this when adjusting temperature (i.e. let the controller follow its own logic).

Obviously like the other pumps, having visibility of the state of the waterfall is still useful as is the option to enable/disable remotely.

gazoodle commented 2 years ago

The main log is huge but I can send it if you want

Yes please. It ought to compress nicely so if you can zip it up and email it to gazoodle at hash dot fyi that might be useful

gazoodle commented 2 years ago

Also, can you all post your EN/CO version data. It can be found in the device page for the integration. Mine is:

Device info inYT 380 v4.0 by Gecko Alliance Firmware: SpaPack:v33.00 Config:65 Log:66 Hardware: EN:88 v15.0 CO:89 v11.0

llamakarma commented 2 years ago

Here's mine:

HotTub: inYT 380 v4.0 by Gecko Alliance Firmware: SpaPack:v33.00 Config:65 Log:66 Hardware: EN:88 v15.0 CO:89 v11.0

Pool: inYT 321 v11.0 by Gecko Alliance Firmware: SpaPack:v33.00 Config:63 Log:63 Hardware: EN:88 v15.0 CO:89 v11.0

gazoodle commented 2 years ago

I've noticed that when there is more than one client talking to the in.touch2 EN module, it can get very confused. If I run the iOS/Android app at the same time as HA and one of my diag clients then it takes ages to connect and gets confused, even the Gecko app whinges.

It occurred to me most folk will probably have another (silent) client connected from Gecko which might be exacerbating this issue. Personally I've blocked intouch.geckoal.com on my PiHole which stops the EN module from calling home. This does break the "Remote" functionality in the iOS/Android app from but I don't care about since I use a VPN to control stuff when I'm away from home.

I've currently disabled the domain block in PiHole to see if my setup gets into the same confused state as other folk, and since we've recently replaced our aging spa with a new one (and I raided the old one for bits :-) ), I have the spa pack on my workbench pretending to be another, so I can semi-replicate @llamakarma's setup.

llamakarma commented 2 years ago

Thanks for your efforts!

I'm not sure where you are based, but for me (in Europe), the iOS app has always been slow and unreliable. I figured that either the wireless link between the pack and the wireless bridge to the packs was weak, the 'cloud' service was based in outer Mongolia, or the app was just crap. It's hard for me to determine whether the app got better or worse since setting up your plugin because it has always been so bad.

Looking more closely now, I searched my fw logs and can see no connections outbound at all from either inTouch2 box. I power cycled one of them to try to kick it into life and still nothing. In fact the only thing I do see is the phone reaching out to api-intouch2.geckoal.com and then later talking to a number of AWS instances.

I cross checked by shutting off the phone wifi and I do not get any remote access - so it appears the iOS client is talking directly to the spa packs after whatever it does with the public API. When time permits I may setup a sniffer to figure this out more.

tl;dr I don't think there is a sneaky cloud connection, but I agree the iOS app could be causing conflicts. I've made sure it is not running on the phone.

RhinoRich commented 2 years ago

The main log is huge but I can send it if you want

Yes please. It ought to compress nicely so if you can zip it up and email it to gazoodle at hash dot fyi that might be useful

I have reinstalled this morning. When it disconnects I will zip the log and send to you.

Device info inYT 338 v10.0 by Gecko Alliance Firmware: SpaPack:v33.00 Config:65 Log:66 Hardware: EN:88 v15.0 CO:89 v11.0

RhinoRich commented 2 years ago

I reinstalled the latest version at the weekend and also updated HA. Since then I have had errors but it has stayed connected and hasn't missed any instructions or given any kind of problem. This appears multiple times but doesn't appear to be affecting the operation: image

gazoodle commented 2 years ago

Those "No handler" warnings are really only for my benefit to understand how often the intouch2 module sends unsolicited packets to my driver and if its down to my misunderstanding - it usually is :smile:

I think I've made the connection stuff more robust so that ought to deal with the connection stability issues folk were seeing. I'll leave the warnings for a while just in case I figure out it's something I'm (not) doing.

Is anyone on this thread still getting the overnight disconnects, if so I'll keep this open and try harder :+1:

RhinoRich commented 2 years ago

Thanks for all your efforts with this. It's an excellent integration and more important than ever now that energy prices have risen so much.

Since re-installing, it hasn't missed a single instruction.

On Fri, 15 Apr 2022 at 13:40, gazoodle @.***> wrote:

Those "No handler" warnings are really only for my benefit to understand how often the intouch2 module sends unsolicited packets to my driver and if its down to my misunderstanding - it usually is 😄

I think I've made the connection stuff more robust so that ought to deal with the connection stability issues folk were seeing. I'll leave the warnings for a while just in case I figure out it's something I'm (not) doing.

Is anyone on this thread still getting the overnight disconnects, if so I'll keep this open and try harder 👍

— Reply to this email directly, view it on GitHub https://github.com/gazoodle/gecko-home-assistant/issues/49#issuecomment-1100081566, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASKOHHMZEUMMKS2CTJT2BLDVFFPRZANCNFSM5RF3SEHQ . You are receiving this because you were mentioned.Message ID: @.***>

gazoodle commented 2 years ago

Closing, should be resolved now.