HomeAssistant crash caused by this integration

keteflips commented 6 months ago

First of all, I love this integration, and this is not hate, my theory has strong foundations.

I have my HomeAssistant motorized with Zabbix, and I started suffering random crashes. All I can see it's that around every hour the CPU consumption goes crazy and HA restart...

I checked all and tested all, I do a lot of test, and finally I can see in the logs that every time the CPU usage peaks this component its refreshing the token.

2024-04-26 10:37:44.678 INFO (MainThread) [custom_components.aquarite.aquarite] Token expired, refreshing...

I disabled this component and the issue disappears.

Thanks for your job in this integration.

Hans1205 commented 6 months ago

Hey,

In the meantime, I also believe that something is happening here with the CPU usage. After a HA restart, it is norm again at first and then the CPU load increases again. I have not yet tried to find out the cause.

Von meinem iPhone gesendet

Hans Moggert

Am 28.04.2024 um 13:27 schrieb keteflips @.***>:

First of all, I love this integration, and this is not hate, my theory has strong foundations.

I have my HomeAssistant motorized with Zabbix, and I started suffering random crashes. All I can see it's that around every hour the CPU consumption goes crazy and HA restart...

image.png (view on web)https://github.com/fdebrus/hayward-ha/assets/22074069/2a83fba7-79c7-4056-b6b2-6b2681ebe744

I checked all and tested all, I do a lot of test, and finally I can see in the logs that every time the CPU usage peaks this component its refreshing the token.

2024-04-26 10:37:44.678 INFO (MainThread) [custom_components.aquarite.aquarite] Token expired, refreshing...

I disabled this component and the issue disappears.

Thanks for your job in this integration.

— Reply to this email directly, view it on GitHubhttps://github.com/fdebrus/hayward-ha/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFFSQBEBP4AZDYQY2HPCUSDY7TMITAVCNFSM6AAAAABG45CJT6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DONBZG4ZTCOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

keteflips commented 6 months ago

I re-enabled the integration with the 0.0.8.b1 upgrade and now it works flawless.

I thing some thing changes in the token renovation in this version that solves my problem :)

Hans1205 commented 6 months ago

Ok, you should be lucky.

The problem repeats itself for me. After a restart, the CPU load increases continuously to just under 40%. When I first noticed the problem, it even went up to 100%

Von: keteflips @.> Gesendet: Mittwoch, 1. Mai 2024 19:13 An: fdebrus/hayward-ha @.> Cc: Hans1205 @.>; Comment @.> Betreff: Re: [fdebrus/hayward-ha] HomeAssistant crash caused by this integration (Issue #8)

I re-enabled the integration with the 0.0.8.b1 upgrade and now it works flawless.

I thing some thing changes in the token renovation in this version that solves my problem :)

— Reply to this email directly, view it on GitHubhttps://github.com/fdebrus/hayward-ha/issues/8#issuecomment-2088785551, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFFSQBEYYJFGO56IBFK2AT3ZAEPATAVCNFSM6AAAAABG45CJT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYG44DKNJVGE. You are receiving this because you commented.Message ID: @.**@.>>

folke commented 6 months ago

What I see, is that approximately every 55 minutes, there's an increase in requests and both CPU and network usage increases in a step like pattern.

AndyRoc1 commented 6 months ago

Same problem here. CPU load and network usage increases a lot, eventually causing HA to crash and restart.

fdebrus commented 5 months ago

Thanks for the feedback, every 55min the token expires and needs to be refreshed. I do not know yet what can cause CPU / Network increase at that point in time. I will be looking

fdebrus commented 5 months ago

Could not find (yet) a better way, open for idea and suggestion.

When the token expires, I reconnect hayward and re-subscribe to the pool document in firestore.

I trust HomeAssistant is not so happy with the coordinator being re-created with the new data source. I could not find a way to "update" the data source in home assistant.

Current way of thinking, I can be wrong and keep reading HA docs...

    async def get_token_and_expiry(self):
        """Fetch token and expiry using Google API."""
        url = f"{GOOGLE_IDENTITY_REST_API}:signInWithPassword?key={API_KEY}"
        headers = {"Content-Type": "application/json; charset=UTF-8"}
        data = json.dumps({
            "email": self.username,
            "password": self.password,
            "returnSecureToken": True
        })
        resp = await self.aiohttp_session.post(url, headers=headers, data=data)
        if resp.status == 400:
            raise UnauthorizedException("Failed to authenticate.")
        self.tokens = await resp.json()
        self.expiry = datetime.datetime.now() + datetime.timedelta(seconds=int(self.tokens["expiresIn"]))
        self.credentials = Credentials(token=self.tokens['idToken'])
        self.client = Client(project="hayward-europe", credentials=self.credentials)
        if hasattr(self, 'handlers') and self.handlers:
            for pool_id, handler in self.handlers:
                _LOGGER.debug(f"Resubscribing to pool {pool_id}")
                await self.subscribe(pool_id, handler)

    async def subscribe(self, pool_id, handler) -> None:
        doc_ref = self.client.collection("pools").document(pool_id)
        def on_snapshot(doc_snapshot, changes, read_time):
            """Handles document snapshots."""
            try:
                for change in changes:
                    _LOGGER.debug(f"Received change {change.type} in firestore")
                for doc in doc_snapshot:
                    try:
                        handler(doc)
                    except Exception as handler_error:
                        _LOGGER.error(f"Error executing handler: {handler_error}")
            except Exception as e:
                _LOGGER.error(f"Error in on_snapshot: {e}")
        doc_ref.on_snapshot(on_snapshot)
        self.handlers.append((pool_id, handler))
        max_size = 10
        self.handlers = self.handlers[-max_size:]

FRight80 commented 5 months ago

The firestore documentation writes something about detaching the listener to avoid keeping bandwidth open on the client. https://firebase.google.com/docs/firestore/query-data/listen#detach_a_listener Maybe explicitly unsubscribing before the re-subscribe can do it?

fdebrus commented 5 months ago

I was under the assumption that with token expiration, the connectivity to the document will be broken and listener detached. Will investigate further. thanks !

fdebrus commented 5 months ago

version 0.0.9 is out, have a try. I could not 100% test it as we need to wait for token expiration (55 min).

fdebrus commented 5 months ago

I'm removing 0.0.9, it does not work as expected.

fdebrus commented 5 months ago

According my reading, aihttp_session shall manage pretty well by itself session opening / closing / ... As I reuse the same session upon token refresh, it shall be fine. So I turned my investigation towards firestore. I have changed the routine to re-connect a document after token refresh. validation is ongoing. Will update by end of day.

fdebrus commented 5 months ago

No luck, will need more time to troubleshoot and fix.

fdebrus commented 5 months ago

More testing ongoing but it's stable and working for 24hrs... I will publish a new beta for you to validate on your setup

2024-05-08 14:17:51.792 DEBUG (MainThread) [custom_components.aquarite.aquarite] Token expired, refreshing...
2024-05-08 14:17:52.002 DEBUG (MainThread) [custom_components.aquarite.aquarite] Unsubscribing old listener
2024-05-08 14:17:53.005 DEBUG (MainThread) [custom_components.aquarite.aquarite] Re-subscribing with new token
2024-05-08 14:17:53.012 DEBUG (MainThread) [custom_components.aquarite.aquarite] Subscribed with new listener for pool_id 05DE2D353837574E43180937
2024-05-08 14:17:53.185 DEBUG (Thread-ConsumeBidirectionalStream) [custom_components.aquarite.aquarite] Received change ChangeType.ADDED in firestore
2024-05-08 14:17:53.186 DEBUG (MainThread) [custom_components.aquarite.coordinator] Manually updated Aquarite data

fdebrus commented 5 months ago

I do get warning on the google.api but it does have any impact. Checking on the same

Logger: google.api_core.bidi Source: custom_components/aquarite/aquarite.py:110 integration: Aquarite (documentation, issues) First occurred: 3:55:42 PM (1 occurrences) Last logged: 3:55:42 PM

Background thread did not exit.

FRight80 commented 5 months ago

Ran the 0.09 update overnight, seems solid, no errors and no runaway CPU usage. I am however also getting the google.api_core.bidi warnings at each refresh. 2024-05-09 03:34:23.187 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit. 2024-05-09 04:31:23.487 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit. 2024-05-09 05:28:23.778 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit. 2024-05-09 06:25:24.138 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit. 2024-05-09 07:22:24.407 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit. 2024-05-09 08:19:24.697 WARNING (MainThread) [google.api_core.bidi] Background thread did not exit.

AndyRoc1 commented 5 months ago

Did the same and no issues with CPU usage so far.

fdebrus commented 5 months ago

Token refresh and CPU usage peak resolved with 0.0.9 Google api warning is tracked under https://github.com/fdebrus/hayward-ha/issues/10

fdebrus / hayward-ha

HomeAssistant crash caused by this integration #8