briancmpbll / home_assistant_custom_envoy

171 stars 76 forks source link

Integration stopped communicating with Envoy ending 1048 #166

Closed madbrain76 closed 9 months ago

madbrain76 commented 9 months ago

I have an IQ Envoy ending 1048 . It's online on the network. The device home page shows it's communicating with Enlighten. This particular Envoy has CT disabled. At 10am this morning, the Enphase integration stopped communicating with this Envoy. It hasn't been able to for 3.5 hours. I tried a few DEVTEST and DEV versions, but none of that made any difference. I even reset the breaker on the Envoy, to no avail.

The integration is still getting data from my other two Envoys successfully - the ones ending 7059 and 6209.

I have attached a debug log.

I believe this is not the first time the communication stops for a period of time, but it is the first time I am home to witness it and take a log.

Edit: HA hasn't been able to communicate with that Envoy all day. I even restored a backup from a time when it could communicate, and it still couldn't. Seems like state has changed on the Envoy somehow, and that is throwing the integration off.

home-assistant_enphase_envoy_2023-09-29T20-13-18.334Z.log

catsmanac commented 9 months ago

Looking at the log I see it is starting communication using the cached token and authenticate successful with the Envoy.

Checking Token value: yJraWQiOi (Only first 10 characters shown)
Token is populated: yJraWQiOi (Only first 10 characters shown)
Token expires at: 2024-08-01 11:38:33
Detect Model running
HTTP GET Attempt #1 of 2: https://192.168.100.143/production.json?details=1: use token: True: Header: <Blank Header>  Timeout: 60 Holdoff: 0
Received 401 from Envoy; refreshing cookies, in attempt 1 of 2:
HTTP GET Attempt #1 of 2: https://192.168.100.143/auth/check_jwt: use token: True: Header: <Token hidden>  Timeout: 60 Holdoff: 0
Fetched (1 of 2) in 0.2 sec from https://192.168.100.143/auth/check_jwt: <Response [200 OK]>: <!DOCTYPE html><h2>Valid token.</h2>

It then requests and successfully gets the production data page

HTTP GET Attempt #2 of 2: https://192.168.100.143/production.json?details=1: use token: True: Header: <Token hidden>  Timeout: 60 Holdoff: 0
Fetched (2 of 2) in 0.5 sec from https://192.168.100.143/production.json?details=1: <Response [200 OK]>: {"production":[{"type":"inverters","activeCount":12,

Then it continues the get the Ensemble inventory page and that fails, each time (2 attempts in the log)

HTTP GET Attempt #1 of 2: https://192.168.100.143/ivp/ensemble/inventory: use token: True: Header: <Token hidden>  Timeout: 60 Holdoff: 0

There is no timeout or second attempt for this page. HA is still starting the integration and does not report the timeout only first time

Config entry 'Envoy 202317171048' for enphase_envoy integration not ready yet; Retrying in background

So it seems to timeout on the Ensemble page. Can you access https://192.168.100.143/ivp/ensemble/inventory from your browser?

One option to try is change timeouts in the envoy settings. Increase overall timeout and shorten individual page timeout so at least 2 tries fit in the overall timeout. (settings / Integrations / Enphase Envoy / Configure for the 1048.

afbeelding

Other option is to eliminate fetching Ensemble page to see if that helps. In config/custom_components/enphase_envoy edit envoy_reader.py and find:

        await self._update_endpoint(
            "endpoint_ensemble_json_results", ENDPOINT_URL_ENSEMBLE_INVENTORY
        )

Disable this by changing to:

       # await self._update_endpoint(
       #     "endpoint_ensemble_json_results", ENDPOINT_URL_ENSEMBLE_INVENTORY
       # )

Might just lead to a failure on the next page, not sure. What firmware is the 1048 running?

madbrain76 commented 9 months ago

@catsmanac , Thank you very much for your response. I can't access the Ensemble page. Firefox never gets there. There is a timeout after a couple minutes.

image

I didn't try commenting that Ensemble page fetch. Here is what I see for the home page when browsing it :

image

And eventually, it returns that same 504 too.

The firmware is D7.3.466 from Dec 5, 2022 - it wasn't updated recently.

I'm going to try to reboot my Wifi APs and see if there is a difference. May rollback the Wifi AP firmware as well - I'm running a beta version of Unifi (6.3.38) on my NanoHD. But I had been running that same beta for a few days without problem on either of the Wifi-connected IQ Envoys.

madbrain76 commented 9 months ago

I spent a little bit over an hour with Enphase support on this. They said there was likely some database corruption on the device. They were able to fix it. The firmware was also updated to D7.3.517 in the process. HA and the integration are reading data properly from this Envoy again.

madbrain76 commented 9 months ago

Marking this closed.