Closed sveinse closed 1 month ago
Could be solved by setting an attribute on the binary_sensor when an API call fails. Then log the problem. When the API is working again the time stamp can be cleared and a reconnected is logged. That would make it easy to visualize if the API is down as well.
I've had some issue last two nights with charging stopped several times and I suspect APIs. Could have been detected with an error time stamp.
Agree. Zaptec API has been unstable during the last few days. Error 500. Has given a lot of errors in the log
Interestingly, there already exists entities for this: The " Charger", " Circuit" and "* Installation" entities. This entity exists to provide a place to store the legacy attributes and only reads "Connected". But it would be a very useful extension to this that they indeed show the status of the API connection. In hindsight it kinda seems obvious to have this feature.
Definitely something to put on the list! Thanks.
Still getting Time Outs like:
Denne fejl stammer fra en brugerdefineret integration.
Logger: custom_components.zaptec.api Source: custom_components/zaptec/api.py:855 Integration: Zaptec EV charger (documentation) First occurred: 16.52.35 (1 occurrences) Last logged: 16.52.35
Request to https://api.zaptec.com/api/installation/bf68a41d-d2f9-44f5-bfc6-20bf42e645a8 failed (attempt 1): TimeoutError:
Perhaps the Zaptec integration should be more patient before reporting an error. After a short while the connection is established
Hi @sveinse
Did you get any further in your considerations to solve this issue. A pity that a very useful integration produces so many errors
How do we get someone working on this my logs are unusable...
Do you have api debug on? Anyway, it think the proper way way here so is to retry using a librays
I get mostly just timeout errors or the below thats flooding:
custom_components.zaptec.api.RequestError: GET request to https://api.zaptec.com/api/circuits/KEYREMOVED failed with status 500: <ClientResponse(https://api.zaptec.com/api/circuits/KEYREMOVED) [500 Internal Server Error]> <CIMultiDictProxy('Date': 'Tue, 30 Jul 2024 08:32:55 GMT', 'Content-Length': '0', 'Connection': 'keep-alive', 'Request-Context': ‘appId=cid-KEYREMOVED', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains')>
I'm sorry, I haven't had time to look at this.
Currently the design logs every time an API request to Zaptec cloud fails, so when the connection is unstable it will result in many log entries. I agree that this should be changed. The HA quality scale requires for Silver that "Handles device/service unavailable. Log a warning once when unavailable, log once when reconnected."
@Bluhme1, @JTT22 Thank you for your patience. I've merged a fix into master that should reduce the logging to a minimum. Can you please test the "master" version and see if this fixed for you, please?
The best way to deal with this is to add retry on 500 errors so you sjampoen the exception and just retry later. Can someone post the entire blueprint?
@Hellowlol In my experience, when Zaptec cloud starts giving 500, it will do so for quite a while. Nevertheless, I created #111 which will retry up to 5 times before giving up. What do you think? If this makes sense I can merge it to master and we can have it tested by the users that are most prone to this problem.
The main objective of this issue was to control the extensive logging that the API was producing when Zaptec cloud fails. We can't really do anything about availability of the Zaptec cloud. With the fix in #110, the API will no longer log itself, but rather leave it to the HA UpdateCoordinator
to handle available/not-available messages.
I nice thing to do on retries is an exponential delay until next retry. Otherwise all clients will spam the API when it least need a flood of requests. It can be done with the backoff decorator for example, or by code.
Yes, that is a good idea. However, there are limits to how long time these backoffs may be. A full poll from the HA zaptec integration to the Zaptec cloud requires quite many requests. By default this happens every 60 seconds, and any backoffs that enters delays into minutes, will probably interfere with HAs update interval. I'm not sure how the HA UpdateCoordinator responds to that.
The PR has now been updated with an exponential backoff mechanism.
A fix for the issue have been pushed to master. Please install the "master" version and test if this fixes the observed problems. @svenakela @Bluhme1 @JTT22
It seems HACS have removed the option of downloading "master", so I've created a beta/pre-release "0.7.2b1" available for download and test.
Discussion: What is the appropriate action when Zaptec cloud is not responding or returns error?
Currently the Zaptec integration will log them to the HA system log. This can easily be interpreted that its the zaptec integration which is at fault. Typical errors that are observed are timeouts and HTTP error code 500 (internal server error). When this happens, quite many errors and warnings are logged.
The Home Assistant integration Quality scale states that a integration should (for silver):
So what should we do? Do we need to implement some way to determine Zaptec cloud's "healthyness" in order for our integration to reduce the error logging?