BJReplay / ha-solcast-solar

Solcast Integration for Home Assistant
Apache License 2.0
159 stars 31 forks source link

Error 429 connecting to solcast on HA restart, zapped all solcast sensors #12

Closed gcoan closed 2 months ago

gcoan commented 3 months ago

Describe the bug I restarted Home Assistant and when the Solcast integration started up it got an error 429 I believe from retrieving the rooftop sites info: IMG_1030

This caused the rooftop sensor in HA to change to Unknown: IMG_1032

And ALL the solcast PV forecast sensors to change to zero: IMG_1031

Fortunately I spotted this pretty quickly and was able to resolve it by reloading the Solcast integration and all the sensors populated correctly

To Reproduce Restart HA and if Solcast is busy its likely that the same will occur

Expected behavior On HA start, if an error 429 occurs, use the previous solcast sensor values. Do not set any sensors to Unknown or zero.

Desktop (please complete the following information): HA 2024.5.5 HAOS 12.3 Solcast v4.0.25

autoSteve commented 2 months ago

And this makes me well pleased once quota runs out...

Will commit. Stand by.

2024-06-15 18:02:00.149 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-15 18:02:00.149 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API limit exceeded, not getting forecast
2024-06-15 18:02:12.714 INFO (MainThread) [hass_nabucasa.google_report_state] Connected
autoSteve commented 2 months ago

Changes to coordinator.py are purely superficial.

autoSteve commented 2 months ago

Looking back on your logs, @BJReplay, it becomes super apparent how often 429s are hit. I can accept the on-the-hour, and on-the-half-hour 429s, but 17 minutes past the hour??? I thought this would be way harder to test, but catching a 429 seems to be easier than fishing in a barrel.

I'm calling this retry/caching code essential!

autoSteve commented 2 months ago

Folks, I reckon PR #27 is good for a healthy test.

Looking forward to feedback before this is released.

autoSteve commented 2 months ago

This has been released by @BJReplay as 4.0.28. Check it out, and please report any issues whatsoever. Cheers.

gcoan commented 2 months ago

I saw HACS was promoting 4.0.28 as the new release, with well-justified shout-out to @autoSteve, so we're going to get a lot of in the field testing of it!

Renamed my sites file, installed 4.0.28, restarted, no messages at all in the HA log but I see solcast-usage.json was created.

Restarted again at 13:00 to try to trigger the timeout errors, got the following logged: image

Restarted again, in the full log got a 429 on the sites data:

2024-06-15 13:11:21.854 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
2024-06-15 13:11:21.854 INFO (MainThread) [homeassistant.setup] Setup of domain solcast_solar took 0.00 seconds
2024-06-15 13:11:21.854 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7
2024-06-15 13:11:21.855 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
2024-06-15 13:11:21.856 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=True
2024-06-15 13:11:21.856 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
2024-06-15 13:11:21.957 INFO (MainThread) [custom_components.tapo.hass_tapo] Detected model of 192.168.1.146: P110
2024-06-15 13:11:22.079 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
2024-06-15 13:11:22.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
2024-06-15 13:11:22.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 1
...
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 2
...
2024-06-15 13:11:38.473 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
2024-06-15 13:11:38.473 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - loading cached sites data
2024-06-15 13:11:38.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data: {'sites': [{'name': '87 High Street (E)', 'resource_id': 'bc31-4d4c-e844-6a4e', 'capacity': 10, 'capacity_dc': 8.58, 'longitude': -0.163655, 'latitude': 52.114921, 'azimuth': -100, 'tilt': 37, 'install_date': '2023-01-07T00:00:00.0000000Z', 'loss_factor': 0.99, 'tags': ['UK', 'Bedfordshire', 'GivEnergy']}, {'name': '87 High Street (W)', 'resource_id': '23bb-0216-85c6-fd94', 'capacity': 9, 'capacity_dc': 6.34, 'longitude': -0.163704, 'latitude': 52.114912, 'azimuth': 80, 'tilt': 37, 'install_date': '2023-01-07T00:00:00.0000000Z', 'loss_factor': 0.98, 'tags': ['UK', ' Bedfordshire', ' GivEnergy']}], 'page_count': 1, 'current_page': 1, 'total_records': 2}
2024-06-15 13:11:38.480 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
2024-06-15 13:11:38.810 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
2024-06-15 13:11:38.829 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 6/10
2024-06-15 13:11:38.844 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data file exists.. file type is <class 'dict'>
2024-06-15 13:11:38.862 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-15 contains all 48 records
2024-06-15 13:11:38.863 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-16 contains all 48 records
2024-06-15 13:11:38.864 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-17 contains all 48 records
2024-06-15 13:11:38.864 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-18 contains all 48 records
2024-06-15 13:11:38.865 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-19 contains all 48 records
2024-06-15 13:11:38.867 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-20 contains all 48 records
2024-06-15 13:11:38.877 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.28
2024-06-15 13:11:38.877 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)

So two retries for the sites file and then it used the cached version successfully.

I'll change my automation for the regular solcast updates to run at 00:00 and add a few extra polls that will go over the API limit, but its all looking great so far

gcoan commented 2 months ago

The two site downloads at 14:00:00 both worked successfully, code 200 first time!

Will have a look at the next scheduled poll.

Did see this message in the debug for both site polls which is a bit strange: 2024-06-15 14:00:06.311 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast returned 336 records (should be 168)

Did find one bug I think, the solcast-usage.json file isn't being updated correctly.

In the debug log:

2024-06-15 14:00:05.777 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 7 to 8
2024-06-15 14:00:06.305 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-15 14:00:06.305 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200

The JSON file contains: {"daily_limit": 10, "daily_limit_consumed": 6}

And was only updated when HA was last restarted

autoSteve commented 2 months ago

Good to know @gcoan. Technically the cache not updating is not a bug, and by design. But it's a stupid design I guess...

I'll fix it.

gcoan commented 2 months ago

Thanks @autoSteve I called it out because I assumed it that the "run out of API calls" detection would be using this JSON file and it wouldn't be detecting when to stop polling if the file wasn't updated?

autoSteve commented 2 months ago

The file is only used on script restart should the API call get a 429 repeatedly. So it definitely should update every time there is a successful API forecast call. That way, should restart occur and a 429 be seen at the moment it's asking Solcast "how many calls I got left?" it will know where it was up to without actually receiving an answer.

gcoan commented 2 months ago

Couple of quick further updates

I added extra solcast polls tonight after my API calls were exhausted, they all worked as expected, logging that the call couldn't be made.

At 01:00:00 the next update run ran, I deliberately moved it to the 00:00 time to check the retry logic, which worked flawlessly:


2024-06-16 01:00:00.509 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop bc31-4d4c-e844-6a4e
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id bc31-4d4c-e844-6a4e
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/bc31-4d4c-e844-6a4e/forecasts
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:00:00.579 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 31 seconds before retry
2024-06-16 01:00:31.583 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:00:31.637 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 85 seconds before retry
2024-06-16 01:01:56.639 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:01:56.702 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 99 seconds before retry
2024-06-16 01:03:35.703 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:03:36.691 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 0 to 1
2024-06-16 01:03:37.214 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-16 01:03:37.214 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200
2024-06-16 01:03:37.216 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data Returned: {'forecasts': [{'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end': '2024-06-16T00:30:00.0000000Z', 

All good ๐Ÿ‘

autoSteve commented 2 months ago

Good to know @gcoan. Technically the cache not updating is not a bug, and by design. But it's a stupid design I guess...

I'll fix it.

f73c4ad

autoSteve commented 2 months ago

Did see this message in the debug for both site polls which is a bit strange: 2024-06-15 14:00:06.311 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast returned 336 records (should be 168)

I've been noticing that, too. I've not delved into why, but things seem to work.

autoSteve commented 2 months ago

v4.0.29 released, which addresses the bugs you uncovered, @gcoan.

autoSteve commented 2 months ago

Interesting occurrence: issue #31. Restart after 4.0.29 upgrade got log entries that say "you gotta restart", yet issue report raised... Hmmm. The integration was reporting exactly the right thing, and yet, could it do better? (Bloke subsequently restarted several times and all was good.)

So could it do better?

Probably.

I am investigating a back-off retry for the sites data, but then folks would probably report "Ummm. Gah.... My Solcast integration hasn't started as quickly as I'd want, and there are all these retry log entries..."

gcoan commented 2 months ago

@autoSteve What you've added to Solcast (caching of sites and retrying solar forecasts) should fix a lot of the problems, but if there isn't any retry on the initial sites get then I guess we are going to occasionally get issues such as #31 until every Solcast integration user has a cached copy of the sites file.

I think it would be worthwhile to add the retry to the sites file. There is a risk that some might complain about the startup time but most people don't monitor individual integration startups and they're all asynchronous starts anyway.

Maybe in the logged message make it clear that it's a Solcast error.

eg

'Solcast busy, unable to retrieve sites data and no local cached copy, please reload the Solcast integration to retry'

I believe you can just reload the integration rather than having to do a full HA restart

ProphetOfDoom commented 2 months ago

Updated to V4.0.29, deleted the sites.json and solcast.json files and restarted HA and all sensors are zapped. Log shows the following: solcast.log

Reverted to my copy of V4.0.23.1, deleted solcast related files and restarted HA and all sensors back again (2 roofs).

Also, I noticed the Solcast System Option "Enable polling for updates" was enabled after I upgraded to V4.0.29, but I can't be sure how/when it got enabled. Should this option be deleted going forward? I can raise an issue for it if needs be.

gcoan commented 2 months ago

Updated to V4.0.29, deleted the sites.json and solcast.json files and restarted HA and all sensors are zapped. Log shows the following: solcast.log

Reverted to my copy of V4.0.23.1, deleted solcast related files and restarted HA and all sensors back again (2 roofs).

4.0.29 is working fine for me, I don't think your failure to get it working was anything to do with the new version, it was just a solcast error.

Older versions of the integration make a call to solcast on initial startup to get the sites data, if that fails (eg error 429, solcast site is busy) then they would zap all the sensors and the integration fails. That's what I raised the issue on.

In 4.0.28 and 4.0.29 the logic has been changed, the integration now tries a few times to get the sites data, and if it fails then it uses the cached copy of sites (now called solcast-sites.json). If you look at the logfile you will see the repeated fails then it tries to use the cached file, which then fails. If you had just reloaded the Solcast integration, chances are it would have worked this time. When you reverted back to 4.0.23, you were just lucky that Solcast server was able to provide your sites data.

Once the sites file has been successfully downloaded and cached you should never see this startup issue again.

gcoan commented 2 months ago

v4.0.29 released, which addresses the bugs you uncovered, @gcoan.

Installed v4.0.29 yesterday morning but somehow I had exhausted my 10 API calls so wasn't able to test it all day. This morning with the API count reset I could see the integration polling successfully (no 429 errors) and the solcast-usage.json file is being updated each time. Nice work @autoSteve ๐Ÿ‘

I assume that restarting the integration/HA causes an API poll to occur on startup? This is I think why I went from 6 to 10 API calls yesterday morning. I should have been on 4 but think I must have had some extra test API calls in the schedule which used me up.

autoSteve commented 2 months ago

I assume that restarting the integration/HA causes an API poll to occur on startup?

That's not consistent with my testing. I have restarted countless times and the API count is not increased.

Yes, the integration does REST calls on startup, but these calls do not increment the API use count.

ProphetOfDoom commented 2 months ago

Yes, I appreciate I might have just been unlucky, then lucky, but I was in danger of running out of api calls (I only have 10), so thought I'd better bail out for today and make a report to HQ. I don't actually have Solar PV yet, so I'm not dependent on it working (like Predbat users). I'll try again with V4.0.29 (or latest) when my api limit resets and let you know how I get on. I'm also not convinced that a reload is as clean as a restart, so I'll opt for the latter when I give it a go, and I'll repeat what I did earlier and delete all the solcast/sites files before I do a HA restart. Keep up the good work folks, it's nice to see the enthusiasm to improve this integration.

autoSteve commented 2 months ago

v4.0.29 has an issue.

Suggest installing v4.0.30 sooner rather than later.

gcoan commented 2 months ago

I assume that restarting the integration/HA causes an API poll to occur on startup? That's not consistent with my testing. I have restarted countless times and the API count is not increased.

Yes, the integration does REST calls on startup, but these calls do not increment the API use count.

Strange. Can't explain how I went from having 2 to 4 API calls around 2am on Sunday morning and then skipped from 6 to 10 API calls in the morning.

Today's pattern is how I expect the API count to increase, using up by 10 API limit with calls at 1am, 6am, 10am, 2pm, 6:30pm (2 sites). image

Anyway, never mind, it's in the past. I've installed v4.0.31 and its working fine

autoSteve commented 2 months ago

Bizarre @gcoan.

I had a super odd incident yesterday (today?) where I coincidentally restarted the integration at exactly the same time that the automation to update forecast fired.

Weird stuff happened, with seeing three consecutive forecast gathers for my two arrays in the log, all one second apart. So I went from two calls for the day to eight on the brink in a blink.

Methinks: Is some kind of HA automation retry going on?

Only the gods know. ๐Ÿ˜‚ It broke my head, and like you I moved on.

autoSteve commented 2 months ago

But it did get me thinking.

If this is a thing, then surely building a belt/braces into the integration might not be a bad thing. Logic like:

"I've been asked to update forecast. Fine. ... But now I am being asked to update forecast again so soon? Ah, yeah, nah, I'm not doing that as the request is implausible..."

Within five minutes is aggressive, but would have saved me four precious calls... I'm thinking within ten minutes from last successful.

gcoan commented 2 months ago

Makes sense, there's no point in making another forecast download call (and consuming an API call/two calls) so soon after the first one.

Arguably even 30 minutes would be reasonable. Those on 50 API call limits with one array would be calling at most every 30 minutes, or hourly with two arrays. And to be honest I'd say they shouldn't be making so many calls anyway, that's part of why Solcast is overloaded. I would imagine there is some sort of periodic update to the forecast

Addendum: just read that Solcast have a global system of satellites that track cloud coverage which I didn't know, and the forecast is updated every 5-15 minutes https://solcast.com/forecast-accuracy So maybe 20 minutes?

autoSteve commented 2 months ago

I'll haggle ya down to fifteen, guv'na.

The night is but young, for me, so I'll take a look into it.

gcoan commented 2 months ago

Deal !

autoSteve commented 2 months ago

You're on. Super experimental code, @gcoan.

72276b4

This has not had any testing at all. I am at 10/10 calls for today, given the earlier mishap (it was today ๐Ÿคฆ๐Ÿป), so cannot test until GMT 'flies' past here in many more hours. Us down here in Oz who do most dev at night, being from the future and all that need to patiently wait until the globe gets with the programme.

My plan for testing: Create an automation that just fires the solcast_solar.update_forecasts super frequently, or just do it from a service launch manually a few times. Observe logs. If within fifteen, should get SOLCAST - not requesting forecast because time is within fifteen minutes of {get_last_updated_datetime()}, and if outside the fifteen minute window, BAU.

If you don't want to waste calls, then wait until your next refresh time, and then fire it up again within fifteen.

gcoan commented 2 months ago

You're on. Super experimental code, @gcoan.

72276b4

This has not had any testing at all.

@autoSteve It doesn't work, nice series of crashes though:

2024-06-17 12:50:00.321 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 12:50:00.321 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] Solcast solar forecast update: Error executing script. Unexpected error for call_service at pos 1: name 'get_last_updated_datetime' is not defined
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:50:00.325 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] While executing automation automation.solcast_forecast_update
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 723, in async_trigger
return await self.action_script.async_run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1769, in async_run
return await asyncio.shield(create_eager_task(run.async_run()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 464, in async_run
await self._async_step(log_exceptions=False)
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 528, in _async_step
self._handle_exception(
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 558, in _handle_exception
raise exception
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:52:00.111 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 12:52:00.111 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] Solcast solar forecast update: Error executing script. Unexpected error for call_service at pos 1: name 'get_last_updated_datetime' is not defined
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:52:00.112 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] While executing automation automation.solcast_forecast_update
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 723, in async_trigger
return await self.action_script.async_run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1769, in async_run
return await asyncio.shield(create_eager_task(run.async_run()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 464, in async_run
await self._async_step(log_exceptions=False)
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 528, in _async_step
self._handle_exception(
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 558, in _handle_exception
raise exception
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
gcoan commented 2 months ago

@autoSteve I fixed the crash in solcastapi.py, changing the code to:

        if self.get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}")

(added "self." to each of the get_last_updated_datetime() calls)

Did a manual run, its decided not to update solcast, not sure that is correct as I haven't updated (successfully) since 10:15 image

2024-06-17 13:38:09.712 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts 2024-06-17 13:38:09.712 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

Looking at the code I see it gets the 'last_updated' from itself. When I restarted HA to put this new code in, is this zero'd and so the calculation of 'its not 15 minutes elapsed' thus not working?
Either code for this or use sensor.solcast_pv_forecast_api_last_polled?

gcoan commented 2 months ago

Belay that idea, the comparison isn't right somehow. Its 13:50, or 12:50 UTC. Last update was 10:15 (9:15 UTC). It appears to be checking against the correct last update time in UTC. I tried again and then left it so HA has been running 23 minutes but the comparison still fails:

2024-06-17 13:50:50.550 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts 2024-06-17 13:50:50.550 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00 2024-06-17 13:59:58.227 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts 2024-06-17 13:59:58.228 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

autoSteve commented 2 months ago

Much. need. focus.

I've just been working around an issue for version= that oziee got us into...

Epic failure logs. I should frame that as my biggest Python fail. self. is "duh"...

Yeah, something is odd. The comparison is to dt.now(timezone.utc), so what the? I might need to log both the last, and the comparison time that happens.

autoSteve commented 2 months ago

Replace 590 with this. What do you get?

            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}, time now is {dt.now(timezone.utc)}")
autoSteve commented 2 months ago

("self." ... Gah.)

autoSteve commented 2 months ago

OMG. It must be late.

Line 589, change < to >...

        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):
gcoan commented 2 months ago

@autoSteve so why did the Python interpreter not understand what you meant. Too literal these computers. <, >, its all very similar really...

I changed the two lines as above, restarted HA, got 3 successive fails at reading the sites file so it fell back to using the cache and started up OK.

Ran my automation to pull the forecast, got a 429 error, waited 59 seconds, then worked fine this time updating my forecast ;-)

Ran it again a few minutes later, it correctly decided it didn't want to run:

2024-06-17 20:02:16.315 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-21 contains all 48 records
2024-06-17 20:02:16.316 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-22 contains all 48 records
2024-06-17 20:04:17.383 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139689409521136] from 195.166.158.163 (Home Assistant/2024.5.1 (io.robbie.HomeAssistant; build:2024.688; iPadOS 17.5.1)): Disconnected: Did not receive auth message within 10 seconds
2024-06-17 20:05:38.856 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 20:05:38.896 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 19:02:16.296654+00:00, time now is 2024-06-17 19:05:38.896497+00:00

Couple of suggestions from seeing the above behaviours:

ProphetOfDoom commented 2 months ago

Just in case it helps:

Solcast started sweet as a nut.

System Info: Core 2024.6.3 Supervisor 2024.06.0 Operating System 12.3 Frontend 20240610.1

Happy days

autoSteve commented 2 months ago

last run time is in UTC

This should localise it.

_LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime().astimezone(self._tz)}")
gcoan commented 2 months ago

@autoSteve I messed about with the log messages a bit more to make them readable. Here's my final version of the code fragment which works perfectly, thanks

    async def http_data(self, dopast = False):
        """Request forecast data via the Solcast API."""
        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time now ({dt.now(timezone.utc).astimezone(self._tz)}) is within fifteen minutes of last forecast update at {self.get_last_updated_datetime().astimezone(self._tz)}")
            return
autoSteve commented 2 months ago

I had made almost exactly the same change, @gcoan. Just dropped the time now because that is already referenced on the log entry.

autoSteve commented 2 months ago

My last forecast poll was very close to timing out... 366 seconds total, so very near the total async timeout of 480 seconds.

I have another idea for retries to further improve reliability.

Should the successive back-offs eventually fail, then possibly wait a further delay and try the back-off sequence again. No idea of the best way to actually implement that at the moment, because it is an HA automation that triggers Solcast collection.

I wonder whether somebody way smarter than me at automations could think about that with me.

The API Last Polled is exposed as a diagnostic, but no diagnostic info about the last failure to poll. If the last failure datetime were exposed, then I think writing an automation to trigger a retry after, say 20 minutes might be trivial. Randomness would not be required, because the prior attempt used randomness.

Logic like: if last failure to poll is greater than last successful poll, and twenty minutes has elapsed since last failure, then fire a call.

Thoughts?

2024-06-19 13:57:00.500 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:00.652 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 47 seconds before retry
2024-06-19 13:57:07.790 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:57:47.654 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:47.691 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 86 seconds before retry
2024-06-19 13:58:10.258 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Running automation actions
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Executing step call service
2024-06-19 13:59:13.693 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:59:13.787 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 96 seconds before retry
2024-06-19 13:59:50.048 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:00:49.788 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:00:49.874 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 138 seconds before retry
2024-06-19 14:00:59.326 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2024-06-19 14:00:59.879 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-06-19 14:00:59.881 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 2 entries during backup
2024-06-19 14:01:12.854 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:02:57.602 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:03:07.875 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 4 to 5
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200
BJReplay commented 2 months ago

Thoughts?

Trying not to over-think it (as I spin up oracle cloud instances as ampere availability becomes available again - woot); my inclination is to just skip a poll.

Here's why; (other than on first start up of the integration), you've got a forecast; it might be seven days out of date, but it's there.

It might not be up to date, but it's there.

Now, in my use case, for my actual battery control, I'm looking for near term forecasts to work out whether or not to fine tune my battery charging behaviour as I approach the start of expensive power - but really, that's just (as my old boss used to say) pencil f_#%ing: I know what my load is going to be, I know what my insolation is going to be ยฑ cloud cover changes in the next hour or so, and any updated forecast is just going to slightly change my already planned charging activity slightly. Whether or not I get a new forecast won't change the fact that the wind might blow in a slightly different direction and make that updated forecast wrong in any case - so if I don't get it, I still might make the wrong decision.

Which is a really long way of saying: Try a few times, and if you can't get a result, skip that cycle.

Which is what my PV control algo has been doing for the last few years (blindly, mind you - it always uses the latest available forecast, and always tries to update the forecast, and doesn't know whether or not it got an updated forecast or not - it just always re-requests a forecast regularly through the day, and always re-adjusts charging targets based on remaining forecast insolation, load and state of charge).

autoSteve commented 2 months ago

From a practical point of view, I absolutely agree.

From an "I'm getting 429 error", so raise issue POV I think it has merit...

All the logging of retries is usually silently at debug level, so happens in the background.

wimdebruyn commented 2 months ago

I poll roughly once per hour (but not on the hour to avoid 429's). Not each hour as the evening is more important for me to get regular updates. I have an "Allowed API use" sensor for specific times of the day. Here is the template:

    - name: "Solcast API usage max allowed"
      state: >
        {% set hours = [[0,48],[300,6],[400,10],[500,12],[600,16],[700,20],[1200,24],[1500,32],[2000,40]] %}
        {% set t = states('sensor.time').replace(':','')|int %}
        {% set ns = namespace(level = 0) %}
        {% for h in hours if t >= h[0] %}
          {% set ns.level = h[1] %}
        {% endfor %}
        {{ ns.level }}

It returns the current max API calls that should have been allowed up until the current time. I then opportunistically check in another automation with a 30min offset of my normal polling automation if I am allowed to do another poll (ie. if API's used < sensor.solcast_api_usage_max_allowed), and if so, I will poll again (whilst making sure you stay in your allowed usage). This will help if polling soon after 429's is important.

BJReplay commented 2 months ago

Yeah, thinking about it as I was just mowing our little strip of nature strip, I can see a change to the readme, which I'll add to your draft PR along the lines of:

As noted below, we suggest you automate calling solcast_solar.update_forecasts regularly through the day. Internally, solcast_solar.update_forecasts will attempt several retries if it gets a 429 busy response from the Solcast servers. However, if, after a number of attempts, it is unable to retrieve a forecast, it will log this in the logs (which you will see if you enable debug logging) but will otherwise fail silently. You will still have a forecast for the period (if one had been retrieved earlier) rather than a gap - but it will not be as up to date as you might have hoped.

It is up to you to build a belt and braces approach if it is absolutely essential for you to retrieve updated forecasts on a regular basis, bearing in mind that the integration already caches the forecasts that it has retrieved and these forecasts - even if out of date - are available for display and analysis.

autoSteve commented 2 months ago

but will otherwise fail silently

It retries silently, but fails loudly. ๐Ÿ˜‚

gcoan commented 2 months ago

but will otherwise fail silently

It retries silently, but fails loudly. ๐Ÿ˜‚

I think we should make it clear that sensor.solcast_pv_forecast_api_last_polled contains the last successful poll time, so if it fails to poll due to 429's then this sensor time won't be updated.

I agree with @BJReplay thoughts about there being limited criticality of having an absolutely up to date forecast anyway, at best its just a forecast and clouds and stuff happens that means that its never going to be 100% accurate. I update the forecast at 6am, 10am, 1pm, 6pm and 11pm to give Predbat (my battery automation software) the best chance to include a more refined view for start of solar day/afternoon peak/what tomorrow's solar will be etc, but if I miss an update its not the end of the world. [As an aside I find I almost never hit the PV 50% estimate, it's over-optimistic and I usually achieve somewhere around 30% of the way between the PV 10% and PV50% estimates. Having 3 different arrays on East/West sides of the roof probably is a key factor in this, the forecast is thus a composite sum.]

But failing silently to do something that was requested doesn't feel good practice to me and I do feel we ought to log the failure in an accessible way so that those that feel so inclined can automate using the information. We have sensor.solcast_pv_forecast_api_last_polled which is the last successful poll time, how about adding two attributes, last_poll_successful (True/False to indicate whether it did retrieve OK or not) and last_poll_time (time of actual last poll regardless of whether it was successful or not).

autoSteve commented 2 months ago

But failing silently to do something that was requested

Like I said, it fails loudly with an error, but retries at debug log level. Folks would have no idea that it had actually thoughtfully retried before the failure occurred unless debug logging is on. I could log the retries about Solcast being busy as warnings if you both feel that that's better.

Last Poll Time confuses me. We have one that's already called called API Last Polled, and can't rename it (it should be called something like Forecast Last Received), so another name for Last Poll Time I think. API Last Attempted Poll???

The logic would then go something like, if last poll not successful and current time = last attempted poll time + some minutes then trigger the poll.

swests commented 2 months ago

Why not add another sensor to the integration that return a status, e.g. last_result {OK, CACHE, ERROR}. OK = got fresh data CACHE = Using the cached data ERROR = No cache, or another API error.

This way it isnโ€™t a silent fail and further (debug) investigation is neededโ€ฆ