briancmpbll / home_assistant_custom_envoy

176 stars 76 forks source link

Sensor entities stop receiving data #86

Open rdunbar17507 opened 1 year ago

rdunbar17507 commented 1 year ago

Once working this integrations seems to lose connection and needs restarting. Deletion and reinstallation does not help, nor rebooting. Error log below:

Error communicating with API: peer closed connection without sending complete message body (incomplete chunked read)

catsmanac commented 1 year ago

There are some more reports on this (69, 57). This error reflects that something on the envoy side closed the connection for some reason before all data was send. If it doesn't recover after time cycle the envoy to see if that helps and not tried yet.

rdunbar17507 commented 1 year ago

More error logs if this helps:

Logger: homeassistant.config_entries Source: custom_components/enphase_envoy/envoy_reader.py:197 Integration: Enphase Envoy First occurred: 08:12:58 (1 occurrences) Last logged: 08:12:58

Error setting up entry Envoy 122243060763 for enphase_envoy Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/anyio/streams/tls.py", line 130, in _call_sslobject_method result = func(*args) File "/usr/local/lib/python3.10/ssl.py", line 917, in read v = self._sslobj.read(len) ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2578)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/src/homeassistant/homeassistant/config_entries.py", line 387, in async_setup result = await component.async_setup_entry(hass, self) File "/config/custom_components/enphase_envoy/init.py", line 99, in async_setup_entry await coordinator.async_config_entry_first_refresh() File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 226, in async_config_entry_first_refresh await self._async_refresh( File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 258, in _async_refresh self.data = await self._async_update_data() File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 217, in _async_update_data return await self.update_method() File "/config/custom_components/enphase_envoy/init.py", line 50, in async_update_data await envoy_reader.getData() File "/config/custom_components/enphase_envoy/envoy_reader.py", line 387, in getData await self.detect_model() File "/config/custom_components/enphase_envoy/envoy_reader.py", line 422, in detect_model await self._update_from_pc_endpoint() File "/config/custom_components/enphase_envoy/envoy_reader.py", line 162, in _update_from_pc_endpoint await self._update_endpoint( File "/config/custom_components/enphase_envoy/envoy_reader.py", line 181, in _update_endpoint response = await self._async_fetch_with_retry( File "/config/custom_components/enphase_envoy/envoy_reader.py", line 197, in _async_fetch_with_retry resp = await client.get( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1757, in get return await self.request( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1530, in request return await self.send(request, auth=auth, follow_redirects=follow_redirects) File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1617, in send response = await self._send_handling_auth( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1645, in _send_handling_auth response = await self._send_handling_redirects( File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects response = await self._send_single_request(request) File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1719, in _send_single_request response = await transport.handle_async_request(request) File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 353, in handle_async_request resp = await self._pool.handle_async_request(req) File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 253, in handle_async_request raise exc File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 237, in handle_async_request response = await connection.handle_async_request(request) File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection.py", line 90, in handle_async_request return await self._connection.handle_async_request(request) File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http11.py", line 116, in handle_async_request raise exc File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http11.py", line 95, in handle_async_request ) = await self._receive_response_headers(**kwargs) File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http11.py", line 159, in _receive_response_headers event = await self._receive_event(timeout=timeout) File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http11.py", line 195, in _receive_event data = await self._network_stream.read( File "/usr/local/lib/python3.10/site-packages/httpcore/backends/asyncio.py", line 34, in read return await self._stream.receive(max_bytes=max_bytes) File "/usr/local/lib/python3.10/site-packages/anyio/streams/tls.py", line 195, in receive data = await self._call_sslobject_method(self._ssl_object.read, max_bytes) File "/usr/local/lib/python3.10/site-packages/anyio/streams/tls.py", line 137, in _call_sslobject_method data = await self.transport_stream.receive() File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1265, in receive await self._protocol.read_event.wait() File "/usr/local/lib/python3.10/asyncio/locks.py", line 214, in wait await fut asyncio.exceptions.CancelledError

rdunbar17507 commented 1 year ago

There are some more reports on this (69, 57). This error reflects that something on the envoy side closed the connection for some reason before all data was send. If it doesn't recover after time cycle the envoy to see if that helps and not tried yet.

Recycling it bring the data back but doing this repeatedly is a pain. Code needs updating such that this error does not crash the program, instead it waits and rescans for data.

catsmanac commented 1 year ago

Agree, but though it seems to crash the program you'll probably find in the log that 1 minute later it tries again and dumps the same log. The crash only terminates the current scan cycle but I expect you see the next scan will start but just hits the same problem. Can you confirm this? I learned that with some other of these crash dumps.

The real issue is understanding why the Envoy terminates the data flow until it restarts. I've had the envoy disconnect with status 503, service unavailable, when I was testing with a second home assistance instance in parallel to the same envoy. But that recovered after 5 minutes.

catsmanac commented 1 year ago

You can enable debug on the integration, that will add some debug output to the logfile and show more details on what it is doing.

rdunbar17507 commented 1 year ago

Agree, but though it seems to crash the program you'll probably find in the log that 1 minute later it tries again and dumps the same log. The crash only terminates the current scan cycle but I expect you see the next scan will start but just hits the same problem. Can you confirm this? I learned that with some other of these crash dumps.

The real issue is understanding why the Envoy terminates the data flow until it restarts. I've had the envoy disconnect with status 503, service unavailable, when I was testing with a second home assistance instance in parallel to the same envoy. But that recovered after 5 minutes.

If I leave the integration without restarting, it may fetch data after a period of minutes or hours. It is completely random. When checking the enphase app or logging into it over IP the envoy is connected and appears fine.

catsmanac commented 1 year ago

Yes that is consistent with remarks in the other issues mentioned. Just to understand it correctly, "If I leave the integration without restarting" is restarting the HA Envoy integration, HA completely or the Envoy?

rdunbar17507 commented 1 year ago

Yes that is consistent with remarks in the other issues mentioned. Just to understand it correctly, "If I leave the integration without restarting" is restarting the HA Envoy integration, HA completely or the Envoy?

Normally I restart the integration, but restarting HA or conducting a HA reboot makes no noticeable difference. The only thing I haven’t done is restart the enphase envoy unit.

catsmanac commented 1 year ago

Any idea if this is happening at the same point each time? The integration performs multiple gets to obtain data from the envoy. Is it always the same get that triggers this? (To find this you probably need to enable debug until it happens. Might generate a lengthy logfile.) What envoy are you using and do you have meters and/or batteries included and how is it connected to the network?

I see the code has not much error catching and logic to work on cookies when something fails. So changing it to detect and handle this situation will at least require quite some testing. Probably doable, but will take time.

Potential temporary work-around to automatically restart the integration when this happens (no root-cause fix at all) is to create a trigger detecting an unavailable status of one of the envoy items and then call the service _homeassistant.reload_configentry to reload the envoy integration. Would need to have a bit of smarts to not execute at HA start or during reload of the integration as status will be unavailable then too.

Other work-around may be to use the rest integration as described in 32. Will require quite some configuration work and testing. Or some other fork of this. Though many of these have the same core logic.

rdunbar17507 commented 1 year ago

My observations today so far is that this problem only happens when the envoy cannot connect with the cloud server. It can connect to the local network and internet but for some reason there is no connection with the enphase server. When this happens the envoy does not transmit complete data to HA. If you restart in integration, the data appears momentarily before it then displays data unavailable. Once the link with the server is restored, the envoy will then share the data with HA.

Any idea if this is happening at the same point each time? The integration performs multiple gets to obtain data from the envoy. Is it always the same get that triggers this? (To find this you probably need to enable debug until it happens. Might generate a lengthy logfile.) What envoy are you using and do you have meters and/or batteries included and how is it connected to the network?

I see the code has not much error catching and logic to work on cookies when something fails. So changing it to detect and handle this situation will at least require quite some testing. Probably doable, but will take time.

Potential temporary work-around to automatically restart the integration when this happens (no root-cause fix at all) is to create a trigger detecting an unavailable status of one of the envoy items and then call the service _homeassistant.reload_configentry to reload the envoy integration. Would need to have a bit of smarts to not execute at HA start or during reload of the integration as status will be unavailable then too.

Other work-around may be to use the rest integration as described in 32. Will require quite some configuration work and testing. Or some other fork of this. Though many of these have the same core logic.

My observations today so far is that this problem only happens when the envoy cannot connect with the cloud server. It can connect to the local network and internet but for some reason there is no connection with the enphase server. When this happens the envoy does not transmit complete data to HA. If you restart in integration, the data appears momentarily before it then displays data unavailable. Once the link with the server is restored, the envoy will then share the data with HA.

rdunbar17507 commented 1 year ago

Given my observations today, I believe this is linked to the following:

https://github.com/home-assistant/core/issues/51766

https://support.enphase.com/s/question/0D53m00006ySLuRCAW/unimpressed-with-loss-of-local-api-connectivity-to-envoys

Thoughts?

catsmanac commented 1 year ago

This integration by @briancmpbll and others is actually the solution to those 2 items. The Envoy firmware versions starting with D7.0 and newer have added a security mechanism that requires the use of a token to authenticate with the envoy.

When configuring the envoy you had to specify your enphase enlighten website username and password and check the 'use enlighten' option to activate that mechanism. That tells the integration to connect to the enphase website en obtain a token to use for it's connections to the envoy. If not you may have an older firmware. You can find your envoy firmware in the enphase website.

To see in the logfile if it is getting a token before connecting you can enable debug on the integration (go to the integrations screens, click the 3 dot menu in the lower right corner of the envoy and select enable debug). Then reload the envoy integration again to get startup debug as while. After couple of minutes disable debug again. That will download the log file to your computer and you are in for some detailed reading of what it all did. You should be able to find logs of it obtaining a token and making first contact with the envoy. IF that all works the issues you mentioned are no longer related.

forlornlawngnome commented 1 year ago

I am having the same problem where it goes unavailable. Super weird, some are 5 minutes long, one was a whole hour. I am happy to help debug or provide logs, though I'm pretty new to HA (Hubitat user mostly).

rdunbar17507 commented 1 year ago

This integration by @briancmpbll and others is actually the solution to those 2 items. The Envoy firmware versions starting with D7.0 and newer have added a security mechanism that requires the use of a token to authenticate with the envoy.

When configuring the envoy you had to specify your enphase enlighten website username and password and check the 'use enlighten' option to activate that mechanism. That tells the integration to connect to the enphase website en obtain a token to use for it's connections to the envoy. If not you may have an older firmware. You can find your envoy firmware in the enphase website.

To see in the logfile if it is getting a token before connecting you can enable debug on the integration (go to the integrations screens, click the 3 dot menu in the lower right corner of the envoy and select enable debug). Then reload the envoy integration again to get startup debug as while. After couple of minutes disable debug again. That will download the log file to your computer and you are in for some detailed reading of what it all did. You should be able to find logs of it obtaining a token and making first contact with the envoy. IF that all works the issues you mentioned are no longer related.

Extract from debugging log:

2023-06-06 20:28:19.227 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] TransportError: peer closed connection without sending complete message body (incomplete chunked read) 2023-06-06 20:28:19.228 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #2: https://192.168.28.15/home.json: Header:{'Authorization': 'Bearer eyJraWQiOiI3ZDEwMDA1ZC03ODk5LTRkMGQtYmNiNC0yNDRmOThlZTE1NmIiLCJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiJ9.eyJhdWQiOiIxMjIyNDMwNjA3NjMiLCJpc3MiOiJFbnRyZXoiLCJlbnBoYXNlVXNlciI6Im93bmVyIiwiZXhwIjoxNzE3NjE1NjEwLCJpYXQiOjE2ODYwNzk2MTAsImp0aSI6IjZjZTUwYjcyLWRmNjgtNGY3Ny1iZjQyLWJlYTgzZjBjY2JiOCIsInVzZXJuYW1lIjoicm9zc19hX2R1bmJhckBob3RtYWlsLmNvbSJ9.erQP5OE-fggDzoEEPwl5opxj_0TzMTXkVRWSRvqUThQKOi_MBQ0NIPl2EXSW97kWaky6ldqTAvqhbSRQrcj_BQ'} 2023-06-06 20:28:22.522 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] TransportError: peer closed connection without sending complete message body (incomplete chunked read) 2023-06-06 20:28:22.522 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #3: https://192.168.28.15/home.json: Header:{'Authorization': 'Bearer eyJraWQiOiI3ZDEwMDA1ZC03ODk5LTRkMGQtYmNiNC0yNDRmOThlZTE1NmIiLCJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiJ9.eyJhdWQiOiIxMjIyNDMwNjA3NjMiLCJpc3MiOiJFbnRyZXoiLCJlbnBoYXNlVXNlciI6Im93bmVyIiwiZXhwIjoxNzE3NjE1NjEwLCJpYXQiOjE2ODYwNzk2MTAsImp0aSI6IjZjZTUwYjcyLWRmNjgtNGY3Ny1iZjQyLWJlYTgzZjBjY2JiOCIsInVzZXJuYW1lIjoicm9zc19hX2R1bmJhckBob3RtYWlsLmNvbSJ9.erQP5OE-fggDzoEEPwl5opxj_0TzMTXkVRWSRvqUThQKOi_MBQ0NIPl2EXSW97kWaky6ldqTAvqhbSRQrcj_BQ'}

rdunbar17507 commented 1 year ago

This only happens when the envoy can’t connect to enlighten. Diagnostics in envoy show no connectivity issues yet problem with connection to enlighten with envoy persist.

catsmanac commented 1 year ago

@rdunbar17507, your logs shows attempts 2 and 3 to read home.json page in the envoy

TransportError: peer closed connection without sending complete message body (incomplete chunked read)
HTTP GET Attempt #2: https://192.168.28.15/home.json: Header:{'Authorization': 'Bearer <>'}
TransportError: peer closed connection without sending complete message body (incomplete chunked read)
HTTP GET Attempt #3: https://192.168.28.15/home.json: Header:{'Authorization': 'Bearer <>'}

This comes from the code section where it first reads /production.json then /ivp/ensemble/inventory, and lastly home.json.

Can you find the same HTTP Get attempts for /production.json and /ivp/ensemble/inventory or is it just having the issue with home.json? Home.json is (only) used to get grid status but is also reporting all kind of status of the envoy itself.

If issues only happens on home.json, a far shot try could be to disable the call to get home.json in the code and see if that is a workaround. You would loose grid status reported by the Envoy during that testing. As said it's a far shot to see if home.json is the actual culprit or not.

Edit envoy_reader.py in /config/custom_components/enphase_envoy and look for ENDPOINT_URL_HOME_JSON and comment the 3 code lines in _update_from_pc_endpoint as shown below.

        """Update from PC endpoint."""
        await self._update_endpoint(
            "endpoint_production_json_results", ENDPOINT_URL_PRODUCTION_JSON
        )
        await self._update_endpoint(
            "endpoint_ensemble_json_results", ENDPOINT_URL_ENSEMBLE_INVENTORY
        )
        #await self._update_endpoint(
        #    "endpoint_home_json_results", ENDPOINT_URL_HOME_JSON
        #)
DanaGoyette commented 1 year ago

If I log into the device's web UI, the browser dev tools show that even their own request to home.json is failing!

If I copy the request as a curl command and add -v, I can see that it's responding with a 301, redirecting to /error. I've pasted just the response:

< HTTP/1.1 301 Moved Permanently
< Server: openresty/1.17.8.1
< Date: Fri, 07 Jul 2023 23:18:45 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Connection: keep-alive
< Pragma: no-cache
< Expires: 1
< Cache-Control: no-cache
< Location: /error
< Strict-Transport-Security: max-age=63072000; includeSubdomains
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
<
* TLSv1.2 (IN), TLS alert, close notify (256):
* transfer closed with outstanding read data remaining
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
curl: (18) transfer closed with outstanding read data remaining

If I make curl send an HTTP/1.0 request instead of HTTP/1.1, by adding -0, it gets rid of the chunked encoding error, but it still gives the 301. So the chunk error is a bug on their end, but likely a red herring for the actual issue.

< HTTP/1.1 301 Moved Permanently
< Server: openresty/1.17.8.1
< Date: Fri, 07 Jul 2023 23:19:25 GMT
< Content-Type: application/json
< Connection: close
< Pragma: no-cache
< Expires: 1
< Cache-Control: no-cache
< Location: /error
< Strict-Transport-Security: max-age=63072000; includeSubdomains
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
<
* TLSv1.2 (IN), TLS alert, close notify (256):
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Steve2017 commented 1 year ago

This only happens when the envoy can’t connect to enlighten.

My experience is that the connection between the Envoy and their website regularly experiences delays. It's meant to call home every 15 minutes, but I find an hour delay is not unusual.

The frustrating thing is that we can still see the data on the Envoy webpage - but apparently cannot harvest it locally.

Update: My Envoy was off-line from Enlighten for more than 3 hours today, so the HACS integration was dropping out after about a minute. Reloading the integration would make it reconnect for another minute. My Envoy reconnected with Enlighten half an hour ago and the integration came back up. Enphase is pointing at everyone but themselves. There must be something in the way they have constructed their API. If I can see the envoy locally, why can't I harvest json data from it? Why does it have to phone home to share once I have been authenticated?

Steve2017 commented 1 year ago

I have now gone nearly 40 hours without a drop-out, so maybe Enphase has improved the connection between the Envoy and Enlighten?????

I'm still inclined to ask for a firmware downgrade.

catsmanac commented 1 year ago

It's been a while on this one. Is this still an issue? We learned that some or many Envoy have a communication hickup at 23:00 and last releases allows for tuning time-outs.