@Infern1 Just a heads up to you and anyone else using this plugin, I'm aware of the poor and intermittent operation of these scripts over the last few months and have been working behind the scenes to find a proper solution for this.
The underlying cause is that a few months before the end of 2018 Honeywell introduced aggressive rate limits on the authentication mechanisms for both V1 and V2 API's. Furthermore those rate limits seem to be shared between both API's, which is an issue for this plugin as the V1 API is polled for high resolution temperature readings but the V2 API must be polled for hot water on/off status, (as the V1 API doesn't provide that information) so for anyone using the hot water plugin this plugin is perceived by Honeywell as two authentication attempts quite close together in each 5 minute period.
As a result these scripts intermittently fall foul of the rate limits and a polling session is missed resulting in a gap in graph data and errors in the munin-node.log.
There is also a bug in the script related to the way the zone data cache works that greatly exacerbates the rate limiting. If the first zone instance of the plugin called fails due to rate limiting rejecting the authentication, the on disk zone data cache is (reasonably enough) not updated as there is no new data to put in it, so it remains expired.
Unfortunately that means the second zone instance will then also try to authenticate, fail, then the next one and so on. In an 8 zone system like mine that means that 8 failed authentication attempts in a row are made and this serves only to increase the lockout period of the rate limiting.
With enough zones this can lead to a semi-permanent lockout as it tries so often that the rate limiting is not lifted for a long time, this can cause huge gaps in the graph data. It's already bad with 8 zones, it would probably cause effectively a permanent ban with 12 zones the first time there was any error.
This zone cache bug I have already fixed locally - if the cache has expired and the query to the server fails I truncate the zone data cache file to 0 bytes. The next instance to check the cache checks the last modified time, sees that the cache is not expired but also sees that the cache is zero size so assumes it has been set that way due to an authentication failure, so does not attempt another authentication of its own.
That way if the first instance fails the remaining instances for that 5 minute period will make no further API call attempts, thus allowing any rate limiting to time out without further provocation from additional failed attempts.
This helps greatly but does not entirely fix the problem which is why I haven't pushed these changes yet.
The real solution to the problem is to save and restore the session id (V1 API) and access_token (V2 API) between 5 minute runs of the plugin and then re-use them, because only the authentication is rate limited, not the actual API calls made after authenticating. This avoids having to do two authentications per 5 minute period.
The V1 session id remains valid for 15 minutes and renews for another 15 minutes every time you make a call, so with 5 minute polling you would only have to authenticate once and then keep using the same session id indefinitely.
The V2 access_token lasts 30 minutes so you can poll every 5 minutes for up to 30 minutes before re-authentication is required, so in theory one authentication in total per 30 minutes should be required once it got going.
The current evohome-client library does not support saving and restoring session id and access_tokens like this across separate script instances, so there is work going on right now to get this new functionality implemented for the next release of evohome-client:
Progress is good and I think this will be ready quite soon. Once that is finalised I'll update evohome-munin to use this new library functionality to save and restore session id and access_token and give it a couple of weeks testing and that should fix this intermittent problem once and for all!
One minor issue is that once those changes are made the newer version of evohome-client will probably be a prerequisite of evohome-munin, however as it is working so poorly at the moment it's probably best if people are just made aware that they need to update evohome-client at the same time, rather than trying to introduce a load of conditional code to be compatible with the older versions of the library and still suffer the same intermittent graphing performance.
@Infern1 Just a heads up to you and anyone else using this plugin, I'm aware of the poor and intermittent operation of these scripts over the last few months and have been working behind the scenes to find a proper solution for this.
The underlying cause is that a few months before the end of 2018 Honeywell introduced aggressive rate limits on the authentication mechanisms for both V1 and V2 API's. Furthermore those rate limits seem to be shared between both API's, which is an issue for this plugin as the V1 API is polled for high resolution temperature readings but the V2 API must be polled for hot water on/off status, (as the V1 API doesn't provide that information) so for anyone using the hot water plugin this plugin is perceived by Honeywell as two authentication attempts quite close together in each 5 minute period.
As a result these scripts intermittently fall foul of the rate limits and a polling session is missed resulting in a gap in graph data and errors in the munin-node.log.
There is also a bug in the script related to the way the zone data cache works that greatly exacerbates the rate limiting. If the first zone instance of the plugin called fails due to rate limiting rejecting the authentication, the on disk zone data cache is (reasonably enough) not updated as there is no new data to put in it, so it remains expired.
Unfortunately that means the second zone instance will then also try to authenticate, fail, then the next one and so on. In an 8 zone system like mine that means that 8 failed authentication attempts in a row are made and this serves only to increase the lockout period of the rate limiting.
With enough zones this can lead to a semi-permanent lockout as it tries so often that the rate limiting is not lifted for a long time, this can cause huge gaps in the graph data. It's already bad with 8 zones, it would probably cause effectively a permanent ban with 12 zones the first time there was any error.
This zone cache bug I have already fixed locally - if the cache has expired and the query to the server fails I truncate the zone data cache file to 0 bytes. The next instance to check the cache checks the last modified time, sees that the cache is not expired but also sees that the cache is zero size so assumes it has been set that way due to an authentication failure, so does not attempt another authentication of its own.
That way if the first instance fails the remaining instances for that 5 minute period will make no further API call attempts, thus allowing any rate limiting to time out without further provocation from additional failed attempts.
This helps greatly but does not entirely fix the problem which is why I haven't pushed these changes yet.
The real solution to the problem is to save and restore the session id (V1 API) and access_token (V2 API) between 5 minute runs of the plugin and then re-use them, because only the authentication is rate limited, not the actual API calls made after authenticating. This avoids having to do two authentications per 5 minute period.
The V1 session id remains valid for 15 minutes and renews for another 15 minutes every time you make a call, so with 5 minute polling you would only have to authenticate once and then keep using the same session id indefinitely.
The V2 access_token lasts 30 minutes so you can poll every 5 minutes for up to 30 minutes before re-authentication is required, so in theory one authentication in total per 30 minutes should be required once it got going.
The current evohome-client library does not support saving and restoring session id and access_tokens like this across separate script instances, so there is work going on right now to get this new functionality implemented for the next release of evohome-client:
https://github.com/watchforstock/evohome-client/issues/57
Progress is good and I think this will be ready quite soon. Once that is finalised I'll update evohome-munin to use this new library functionality to save and restore session id and access_token and give it a couple of weeks testing and that should fix this intermittent problem once and for all!
One minor issue is that once those changes are made the newer version of evohome-client will probably be a prerequisite of evohome-munin, however as it is working so poorly at the moment it's probably best if people are just made aware that they need to update evohome-client at the same time, rather than trying to introduce a load of conditional code to be compatible with the older versions of the library and still suffer the same intermittent graphing performance.