BJReplay / ha-solcast-solar

Solcast Integration for Home Assistant
Apache License 2.0
185 stars 34 forks source link

Error 429 connecting to solcast on HA restart, zapped all solcast sensors #12

Closed gcoan closed 5 months ago

gcoan commented 5 months ago

Describe the bug I restarted Home Assistant and when the Solcast integration started up it got an error 429 I believe from retrieving the rooftop sites info: IMG_1030

This caused the rooftop sensor in HA to change to Unknown: IMG_1032

And ALL the solcast PV forecast sensors to change to zero: IMG_1031

Fortunately I spotted this pretty quickly and was able to resolve it by reloading the Solcast integration and all the sensors populated correctly

To Reproduce Restart HA and if Solcast is busy its likely that the same will occur

Expected behavior On HA start, if an error 429 occurs, use the previous solcast sensor values. Do not set any sensors to Unknown or zero.

Desktop (please complete the following information): HA 2024.5.5 HAOS 12.3 Solcast v4.0.25

oleg-d commented 5 months ago

i don't know if this will help you but this helped me when i had the 429 error.

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

if you have automation to refresh data from solcast exactly on the hour then add a delay to it. for example i am using 2 min 45 secs delay on my automation that updates solcast data.

gcoan commented 5 months ago

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

yes I have this in my solcast automation and I'm getting pretty reliable solcast polls now.

This problem occurred when Home Assistant was restarted (at a random time), the integration reached out to solcast to get the site data, got a 429 error and then that zero'd out all the solcast sensors. The enhancement request is to either prevent the initial solcast call, or trap the 429 error and not over write the sensors if it occurs

autoSteve commented 5 months ago

The enhancement request is to either prevent the initial solcast call, or trap the 429 error

I am looking into adding a configure option to suppress the initial call to Solcast... With two PV arrays I run out of available calls real quick doing restarts.

The initial call is not easily suppressed, as from what I can tell it returns the 'sites' along with forecast data. So I'll need to cache the sites data to survive the restart, which will result in data still showing up on the Energy dashboard right after restart. Doable, but it'll take time.

oleg-d commented 5 months ago

One other integration I use has a toggle that you can enable/disable cloud update at integration startup and HA reboots. Obviously dont know if this could be translated over to this integration but worth mentioning.

For someone who knows about the technical side of things and python can check the PR here from the other integration to see if something can be borrowed into here.

timverlander commented 5 months ago

So I'll need to cache the sites data to survive the restart

Just an alternative thought.. as hobbyist accounts can only have two rooftops, it might be simpler to capture two additional user-entered resource_ids in the configuration flow and if either/both are present then skip the startup discovery call to Solcast and use the supplied resource_id(s) instead?

autoSteve commented 5 months ago

I did some code examining only to find that the sites data is rigged to be definitely already cached by solcastapi.py. Then I did a few restarts of HA and checked API use count at solcast.com.au. It did not go up, so getting the sites must not contribute to API calls confirming the cache read of sites data.

However earlier today I did notice a 429 from Solcast after restarting, with forecast data unavailable in HA. The solar production yellow bars were also missing. Trying to check the log later did not have a 429 present because log clear on restart, so I need to wait until it happens again to see if it's the same issue reported at line 144.

An immediate restart of HA fixed the forecast and solar display.

So in summary, this should never happen on HA start with /config/sites.json (not solcast.json) present and the class variable apiCacheEnabled set True, because that section of code in the try/except block should read the cache and not do a REST call but apiCacheEnabled is always False.

Debug logging for individual components also seems to turn off on HA restart, so I might need to enable system-wide debug logging for a while. Ugh. I got logging sorted.

autoSteve commented 5 months ago

Delving further:

On startup, apiCacheEnabled is always False, hard-coded, so it's not caching the sites collection. It is caching forecasts only.

But it doesn't really need to cache sites collection, because it doesn't impact the count of daily API calls to do so. So no impact, you'd think? Kind of.

What I think is happening here is that Solcast limit API calls when things get too busy, probably only when you have a hobbiest account, and there is already much said by Oziee on that by guiding folks to not gather exactly on the hour. Many folks don't listen, so if your restart occurred near the top of the hour it might fail for that reason.

But it didn't. It failed at a quarter to five.

I am wondering whether some form of retry is in order on failure. Caching is also probably a better option so that the API call never happens. But what if sites change? Get added? This is probably why Oziee left it permanently disabled, to most likely cut down on the number of issues logged against his repository. Not everyone's a Python coder... And not everyone knows how to find and delete a file in Linux.

But he wrote the code to do so. It'd be amusing to try and make it work in the real world.

So I suspect that API limiting, because busy is behind this. I just did about twenty restarts in a row and none of them failed to get the sites from API, despite me having just one weird and seemingly random failure earlier today.

autoSteve commented 5 months ago

@gcoan, you could try out 490494c, which adds a retry mechanism, but testing my suspicion is going to be tough.

autoSteve commented 5 months ago

I am also going to expand on the retry concept, @gcoan. It seems sensible to me that if the rooftop sites collect fails, but there is a cache available to be read (because we restarted prior and all was good), then read it, log the circumstance with a warning, and move on.

I like that.

26 just got merged. What could possibly go wrong? So it's another branch/PR for me to implement the belt/braces fall-back-on-cache.

gcoan commented 5 months ago

@autoSteve yes that seems a sensible approach to use the cached sites data if we get a 429 error.

Do you want me to try the above release or is there another version coming with more changes?

autoSteve commented 5 months ago

Yeah, hold off @gcoan. I want the fall-back in there. Makes TOTAL sense.

BJReplay commented 5 months ago

I'm going to pull the pin for tonight (migraine kicking in to curl up and hide levels), but will (or you can, @autoSteve - welcome as a collaborator) build a release tomorrow (based on your fallback as noted above) when ready. Intention is to tidy up info.md as well (rather than just link to release notes) as well as update PR that requests this repo gets included in HACS (link to latest release and runs of HACS and hassfest actions.

I don't think I'm going to get around to implementing randomised polling before I travel, but I might also update the readme on making the randomised polling the first automation.

autoSteve commented 5 months ago

Ugh. I hate the flashy-flashy-migraines, @BJReplay. Much spekky, but. Until the pain kicks in... Growing out off them, and thankfully haven't had one for two years. (Not to mozz meself, so touch wood...)

Thanks for the welcome.

gcoan commented 5 months ago

Would you believe it, looks like I had a power cut and HA restart last night, and Solcast 429 error on startup! image

Same error, the sites entities are no longer being provided by solcast and all the other forecast entities are empty. Took two reloads before solcast came back, the first reload at 9:17:39 (that you would think would be a random enough time) failed with a 429.

Thanks @autoSteve and @BJReplay for looking into this. Hope you manage to hold off the migraine

autoSteve commented 5 months ago

Line 144. There she goes, @gcoan. That's what we're chasing...

autoSteve commented 5 months ago

https://github.com/BJReplay/ha-solcast-solar/pull/27/commits/b38c62a709be3e806a93a1e7968be696c23b7ec0 is good to test, @gcoan.

I'll hold off merging until someone other than me has tried it.

autoSteve commented 5 months ago

After a successful start you should find the cache (containing json) has been created, which will be used in subsequent failure scenarios.

image
gcoan commented 5 months ago

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

It's just for other integrations I download beta versions from HACS and I'm not seeing any for solcast

image

autoSteve commented 5 months ago

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

Ummm... I've not got that clever. Do you know your way around Github to get from the commit file contents of solcastapi.py that's been modified and manually replace your current one with the one in the commit?

image

I may need to get more clever with beta releases...

gcoan commented 5 months ago

Thanks @autoSteve I figured it out, finding the updated solcastapi.py file in the github release and copying that to /config/custom_components/solcast_solar (making a copy first!).

Restarted HA and the /config/sites.json file has been created and it appears to contain my sites data. I have restarted HA 3 more times and on each restart I've not seen any solcast messages in the HA Core logfile, so not sure whether this means Solcast is not reporting 429 errors and so no retry/use of the cached sites.json is required, or whether its silently used the cached file. Should I see anything in the logfile so I can debug whether it is working or not?

Oh, Solcast entities remained OK throughout all these restarts

autoSteve commented 5 months ago

If a rooftop sites gather had been previously successful then the data from that gather will be used with a warning issued in the log should a 429 occur three times in a row with five second pauses between calls.

autoSteve commented 5 months ago

I have good news, and I have bad news.

So... the good news first, @gcoan. By restarting almost exactly on the hour I was able to replicate a 429 for get sites. The retry mechanism for getting sites data worked perfectly, warned of the occurrence in the log, and the integration used the cache data. So that bit of code checks out as tested.

image

The not-so-good news is that more caching needs to be done, with the call to GetUserUsageAllowance subsequently failing with a 429. This call is also not subject to a 10-call limit, pretty well confirming that this is only occurring at generally busy times for Solcast, and probably for hobbiest accounts only. This also needs a retry mechanism and separate cache to cope seamlessly with startup circumstances.

Back to some good news, though, should the call to GetUserUsageAllowance fail it will not interfere and have sensors go AWOL resulting in data gaps. It just makes my eye twitch, so it will be worked around...

It would also be interesting to potentially build in a much longer retry interval for getting forecast data periodically, say, one minute interval. If that coincidentally happens at a busy time, then I think that forecast update would be skipped until the next scheduled one occurs.

I'm on it. Problem number one first, then I'll move on to get forecast retries.

gcoan commented 5 months ago

Thanks @autoSteve I had been planning on doing some more 'on the hour' restart testing myself, I might as well hold off since you've found a further issue.

Looking at the log fragment you shared, I noticed a typo in the error message, 'Responce' at the end, but moreover, I wonder whether the 429 as it stands should be logged as an Error because then the next line logs a warning that the cache will be used. Could this be confusing, to get an error that isn't really an error because then it's being trapped and processed as a warning (that the cache is being used)?

autoSteve commented 5 months ago

Not my typo 😉 but fixed. Well spotted. There was another one in the code, too.

The log is in reverse time order. The warning about a 429 comes first, then the 429 error occurs that I'm about to work around.

Almost everyone knows an HTTP status 404 for "not found". It does beg the question: what does "status 429" mean to a real person? Weirdo developers like me know that this is the HTTP status code for "go away, I'm busy so try later...", but your average Joe needs to hit up Google to understand it, then raise an issue with Oziee, who then would facepalm for the twentieth time that month 😂. "Status 429, too busy try later" or "Status 429, Solcast too busy", or similar might be well better.

gcoan commented 5 months ago

Thanks for the explanation, yes I was forgetting that the log file is in reverse order.

Agree, 429 is an obscure code that most people won't know about, it's an obscure code. Personally I would have used 503 service unavailable which is a bit more well known, but for most people it's a google search. And as you say then they would have queried it with oziee when its not his fault

swests commented 5 months ago

429 is a rate limiting status code.

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").

This is probably being thrown because the user has hit the API call limit. Although set at 10, multiple roofs would effectively half that (but there are other calls that further reduce the API call count)

autoSteve commented 5 months ago

(I love Oziee, and he's built a great integration, but technically it is his fault, @gcoan. It's a 429. By its very definition as "rate limiting" he should code a retry after a back-off timer, or use cached data if it's not a biggie, not just error out...)

@swests, the Solcast API will return this at times even if the API call limit has not been reached. Some calls are not subject to the hobbyist 10-call limit, like the call for "How many calls can I make?".

Example, when I hit this 429 earlier I was at 6/10 daily calls according to the Solcast web site.

This 429 is occurring at generally busy times, where the site will refuse any call from a hobbyist.

But thanks for the input.

gcoan commented 5 months ago

Yeah I agree, the integration should be written defensively,trap errors when it can do, use caches and certainly not zero the data if it can't make a connection.

Unfortunately the growth of Home Assistant and home solar and probably therefore growth of this integration have been part of the cause of the server impact solcast is trying to limit. Personally I'd be happy to pay a small subscription or be directed by solcast as to the specific time to make my API calls so they balance the traffic over the day. Too many automations making calls at x:00:00 tipped them over the edge. Solcast could also have reduced all hobbyist accounts to 10 API calls a day instead of leaving the older accounts still on 50.

But we make progress !

autoSteve commented 5 months ago

Totes. Take my (small amount) of money, @Solcast. I'd pay 5/10 bucks/quid/whatever per month for less restriction, but so would probably everyone else with HA using Solcast. = same problem for them, but some more dosh to solve the problem.

At least they haven't shut down to hobbyists altogether... their forecasts are actually better than our local Bureau of Met at predicting awful / slightly less-than-awful days.

autoSteve commented 5 months ago

Forgot to mention #12 in the commit description, @gcoan.

https://github.com/BJReplay/ha-solcast-solar/pull/27/commits/1b96dec6df61988513b46b77382d83073efd1585 expands retry/caching to the GetUserUsageAllowance call.

Should the call occur and receive a 429 then without a cache existing it gets real ugly, and a bit confusing (logged errors/blah, blah resulting in Github issues raised and facepalms, but everything will work), however if cache exists there should not be enough "o"s in smooth to describe the experience.

I even translate the 429 into "Solcast too busy".

Amusingly, the variables that this call sets do not seem to have any material effect on the operation of the integration, hence failure not having any impact. It is my belief that Oziee probably intended to monitor the count of calls vs. the call limit to prevent call quota over-use happening, and therefore getting 429. He never did so, but laid the foundation. It was his assumption that 429 is only ever associated with over-quota use that was his un-doing.

429s can happen at any given moment and for several reasons, so this code is entirely redundant, and I now see why it is completely useless and abandoned. It almost certainly doesn't even need to be called by init. I just wasted an hour of my life that I'll never get back, but the log is cleaner. 🤣

Ah, well. That was fun. Sometimes you got to focus on the bigger picture...

autoSteve commented 5 months ago

So far confirmed, @gcoan. I removed the call to GetUserUsageAllowance from init (my possibly wasted hour), where it is only ever called, and restarting on the hour resulted in 429 for gather sites, then the script used cache and moved on, and all was well.

Logic pretty well confirmed. I will continue to monitor when the sun comes up here, but I can't see any reason in code why not making the call could alter anything. (The GetUserUsageAllowance function remains for posterity, but might make a comeback.)

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

I may reinstate something like the GetUserUsageAllowance for this. Every call to get the forecast results in usage data calculated internally, which is currently kept. I'm thinking if usage is clearly over the limit for the next call, then don't bother with the retry/back-off, or even a call at all. That would require knowing the limit (could default to 10 on initial GetUserUsageAllowance call failure), and the remembered call count already resets to zero in code at UTC midnight.

I like it. I'll code it. Tomoz.

gcoan commented 5 months ago

Sounds like you are continuing to have fun @autoSteve, but more significantly I should hold off installing 1b96dec if this contains changes to GetUserUsageAllowance that you've already making further changes to.

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

Sounds like a great improvement to Solcast if it can gracefully handle 429 errors with auto-retry. I see failures every now and again in the Traces for my own Solcast update Automation, some days every single update (at random trigger times) works and some days I get a failure to at least one call. But can I suggest you delay 1+random seconds/2+random sections/ etc. I'm sure there are lots of people who still have automations that run at xx:00:00 and if Solcast pauses a precise minute we'll just keep moving the bottleneck forward in time. I'm also wondering about how long Home Assistant will allow the Solcast service call to execute for. Looks like Service API calls were changed to remove timeouts in 2023.7 so in theory could just keep retrying forever.... 16 minutes delay especially after preceding 1+2+4+8=15 minutes of tries and retries feels a bit long to me.

I'm thinking if usage is clearly over the limit for the next call, then don't bother with the retry/back-off, or even a call at all. That would require knowing the limit (could default to 10 on initial GetUserUsageAllowance call failure), and the remembered call count already resets to zero in code at UTC midnight.

Agreed, no point in making service calls to Solcast and further overloading their servers that we know will fail. Should log a warning message though when this happens.

One final thought, I suggest any cache files written clearly indicate which integration they come from. Worried about proliferation of files that are not identifiable to the end user and thus 'sites.json' doesn't sit well with me. Ideally would keep everything inside the same solcast.json file but if that's not practicable then can we prefix the filenames, e.g. 'solcast_sites.json'

Cheers

autoSteve commented 5 months ago

I suggest any cache files written clearly indicate which integration they come from.

Agreed. solcast-sites.json it is. You can rename your cache file after moving to code.next.

1+random seconds/2+random sections/ etc.

Agreed.

hastarin commented 5 months ago

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

❤️ I had been reading the thread and was going to chime in and suggest a back off and retry with jitter is your best bet. Great to see you both come to that same conclusion.

https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

I don't know a lot about python but https://pypi.org/project/backoff/ seems like it might be put to good use here.

autoSteve commented 5 months ago

Righto.

Re. commit: https://github.com/BJReplay/ha-solcast-solar/commit/08b3b332657e13941228836245ccaa9c2db26014

The integration will now try up to five times to fetch forecast data. Between each try is a delay of (30 seconds * try count), plus a random number of seconds between zero and 30.

The number of API calls is also monitored, and if the limit is reached it will not attempt to gather the forecast.

This is quite untested code, so I have changed my automation to gather on the hour, which should result in some 429s being forthcoming.

If others want to test in a similar way to increase our collective chances then by all means do so. Save a backup copy of solcastapi.py and __init__.py.

I have logging configured in configuration.yaml like this so I can see everything that's going on, and am monitoring home-assistant.log.

logger:
  default: info
  logs:
    custom_components.solcast_solar: debug

@BJReplay, we should definitely not merge #27 until some testing checks out.

BJReplay commented 5 months ago

Ok, I will grab and run as well.

autoSteve commented 5 months ago

I have also changed the API attempts per day in automation to be six instead of five for my dual array set up, which will definitely cause the API limit to be busted at some point.

BJReplay commented 5 months ago

So, I assume to test I:

  1. Grab init.py and solcastapi.py and sensor.py from this branch and copy into my custom_components/solcast_solar directory
  2. Reconfigure Solcast limits as appropriate (I'm an early adopter with a 50 limit, and another service that runs hourly at 11 minutes past the hour from sunrise to sunset so makes around 11 calls a day at the momnet)
  3. Reconfigure my automation to run on the hour every hour
  4. Add logging to configuration yaml
  5. Restart HA
  6. Test :)
BJReplay commented 5 months ago

On restart...

homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 3
homeassistant  | 2024-06-15 15:17:18.480 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:17:18.480 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Solcast.com http status Error 404 - Gathering rooftop sites data
homeassistant  | 2024-06-15 15:17:18.482 ERROR (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Exception error: Traceback (most recent call last):
homeassistant  |   File "/config/custom_components/solcast_solar/solcastapi.py", line 177, in sites_data
homeassistant  |     raise Exception(f"SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data")
homeassistant  | Exception: SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data
homeassistant  |
homeassistant  | 2024-06-15 15:17:18.482 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:17:18.524 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 11/50
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data site count is zero!
homeassistant  | 2024-06-15 15:17:18.527 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)
homeassistant  | 2024-06-15 15:17:18.537 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
homeassistant  | 2024-06-15 15:17:18.547 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
homeassistant  | 2024-06-15 15:17:18.547 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Australia/Sydney
homeassistant  | 2024-06-15 15:17:18.550 INFO (MainThread) [homeassistant.bootstrap] Home Assistant initialized in 26.65s
BJReplay commented 5 months ago

Ahh, I need to set up caching:

homeassistant | 2024-06-15 15:30:36.537 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7

homeassistant  | 2024-06-15 15:30:36.538 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
homeassistant  | 2024-06-15 15:30:36.541 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=False
homeassistant  | 2024-06-15 15:30:36.541 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 1
homeassistant  | 2024-06-15 15:30:42.064 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:42.065 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:42.065 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 2
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 3
homeassistant  | 2024-06-15 15:30:52.082 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:30:52.083 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Solcast.com http status Error 404 - Gathering rooftop sites data
homeassistant  | 2024-06-15 15:30:52.087 ERROR (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Exception error: Traceback (most recent call last):
homeassistant  |   File "/config/custom_components/solcast_solar/solcastapi.py", line 177, in sites_data
homeassistant  |     raise Exception(f"SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data")
homeassistant  | Exception: SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data
homeassistant  |
homeassistant  | 2024-06-15 15:30:52.087 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:30:52.099 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 1
homeassistant  | 2024-06-15 15:30:57.100 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 2
homeassistant  | 2024-06-15 15:31:02.102 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 3
homeassistant  | 2024-06-15 15:31:07.103 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout getting usage allowance, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:31:07.104 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - loading cached usage
homeassistant  | 2024-06-15 15:31:07.110 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 11/50
homeassistant  | 2024-06-15 15:31:07.111 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data site count is zero!
homeassistant  | 2024-06-15 15:31:07.111 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:31:07.111 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)
homeassistant  | 2024-06-15 15:31:07.119 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
homeassistant  | 2024-06-15 15:31:07.129 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
homeassistant  | 2024-06-15 15:31:07.130 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Australia/Sydney

Nonetheless, it seems to be working well (I restarted as the bremor/bureau_of_meteorology integration had a fix for the blocking warnings, so I downloaded that.

BJReplay commented 5 months ago

We have a bug, Houston:

image

Excuse the long screenshot, but it shows the normal pattern of no polling overnight, and hourly polling during the day.

sensor.solcast_pv_forecast_api_last_polled is set to a null unix datetime

Also, I'm not sure how to set up caching.

BJReplay commented 5 months ago

Hmmm, third restart lucky?

homeassistant  | 2024-06-15 15:56:14.305 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
homeassistant  | 2024-06-15 15:56:14.305 INFO (MainThread) [homeassistant.setup] Setup of domain solcast_solar took 0.00 seconds
homeassistant  | 2024-06-15 15:56:14.305 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7
homeassistant  | 2024-06-15 15:56:14.306 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
homeassistant  | 2024-06-15 15:56:14.309 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=False
homeassistant  | 2024-06-15 15:56:14.309 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'dict'>
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 200
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing sites data cache
homeassistant  | 2024-06-15 15:56:14.608 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data: {'sites': [{'name': 'redacated', 'resource_id': 'redacted', 'capacity': 5, 'capacity_dc': 4.85, 'longitude': redacted, 'latitude': redacted, 'azimuth': -123, 'tilt': 5, 'install_date': '2012-04-18T14:00:00.0000000Z', 'loss_factor': 0.9, 'tags': ['Home ', 'redacted']}], 'page_count': 1, 'current_page': 1, 'total_records': 1}
homeassistant  | 2024-06-15 15:56:14.608 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:56:14.654 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
homeassistant  | 2024-06-15 15:56:14.658 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 12/50
homeassistant  | 2024-06-15 15:56:14.668 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data file exists.. file type is <class 'dict'>
homeassistant  | 2024-06-15 15:56:14.682 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-15 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.698 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-16 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.704 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-17 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.705 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-18 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.707 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-19 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.712 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-20 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.754 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:56:14.756 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)
autoSteve commented 5 months ago

The ugly exception is saying that the sites data could not be acquired, and nor did you have a cache created yet, @BJReplay. Sequence is 429 fail for all three tries at the API, then a 404 to say the cache is not found. Honestly, it could be a little prettier.

Now that you have a solcast-sites.json file that will not reoccur. However for someone first setting up the integration and hitting that it would be a head scratcher. Unlikely, but Murphy is not an optimist...

Maybe a readme note is advisable.

autoSteve commented 5 months ago

sensor.solcast_pv_forecast_api_last_polled is set to a null unix datetime

Did this go away after successful restart?

BJReplay commented 5 months ago

Did this go away after successful restart?

Yes, it did.

Seems to be working. Will monitor over the next day or so.

autoSteve commented 5 months ago

Honestly, it could be a little prettier.

I have added a further error message, which will be included in the next commit: SOLCAST - Solcast integration did not start correctly, as rooftop sites data is needed. Suggestion: Restart the integration

Further, if the cache is not available: SOLCAST - cached sites data is not yet available to cope with Solcast API being too busy - at least one successful API call is needed

A readme note is definitely advisable to explain what is happening should this be hit.

autoSteve commented 5 months ago

This makes me well pleased...

2024-06-15 17:00:28.059 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:00:28.259 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 40 seconds before retry
2024-06-15 17:00:46.571 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2024-06-15 17:00:47.095 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-06-15 17:00:47.098 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 12 entries during backup
2024-06-15 17:01:08.261 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:01:08.352 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 67 seconds before retry
2024-06-15 17:02:15.354 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:02:15.848 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 6 to 7
2024-06-15 17:02:15.855 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-15 17:02:15.855 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200
2024-06-15 17:02:15.857 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data Returned: {'forecasts': [{'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end':
 '2024-06-15T07:30:00.0000000Z', 'period': 'PT30M'}, {'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end': '2024-06-15T08:00:00.0000000Z', 'period': 'PT30M'}, {'pv_estimate': 0, 'pv_estim
autoSteve commented 5 months ago

Exhausing API usage quota does not make me well pleased. Much error. More work required...

BJReplay commented 5 months ago

A readme note is definitely advisable to explain what is happening should this be hit.

Agreed, but the logging that I've captured (and I assume you have as well) gives me a fair bit to work from to explain how it should work.

autoSteve commented 5 months ago

I added a note to the readme already.