Estimate of capaicty - Githubissues

peterdudfield commented 1 month ago

I think you update the capacity every 3 months of so. Do you estimate the capacity anyway continuous? This might help estimate what the current PV capacity is without big steps

This is the date we have collected, where you can see some big steps. We have use you effective_capaicty_mwp variable here

JamieTaylor-TUOS commented 1 month ago

@peterdudfield the capacity dataset is updated quarterly, to align with the release cycles of the underlying datasets (REPD, FiT, Solar Media Ground Mount Report, MCS). The dataset includes actual install dates for all systems though, so you should see a steady increase in the historical capacity and a flat line since the latest update:

The PV_Live calculation at a given time uses the cumulative capacity installed to that date, and so the installedcapacity_mwp is our best estimate of how much capacity was installed at a given point in time.

The effective_capacity_mwp is only really used for modelling purposes to correct for the age of the GB fleet relative to the age of the sample fleet (it applies some performance degradation to the installedcapaity_mwp based on previous works).

Not sure why your capacity data looks like that - it should look like the above plot 🤔

More info about our installed capacity estimates here: https://api.solar.shef.ac.uk/pvlive/capacity (gets automatically updated after each new release of capacity data)

peterdudfield commented 1 month ago

Thanks @JamieTaylor-TUOS for getting back so quickly. Yea agree when you look back at 10 years of data you get the graph you showed. However if you pull the live data ever 30 minutes for 1.5 years, you get the graph we get, which is much more stepped. Hence there might be a reason, since the last quarterly update, to increase the capacity as an slightly better estimate?

JamieTaylor-TUOS commented 1 month ago

Hmm, that's not what I see...

from datetime import datetime, timedelta
import pytz

import pandas as pd
from pvlive_api import PVLive

start = pytz.utc.localize(datetime.utcnow() - timedelta(days=550))
end = pytz.utc.localize(datetime.utcnow())
pvl = PVLive()
pvlive_data = pvl.between(start, end, entity_type="pes", entity_id=0, extra_fields="installedcapacity_mwp,capacity_mwp", period=30, dataframe=True)
pvlive_data.sort_values(["datetime_gmt"]).plot(
    backend="plotly",
    x="datetime_gmt",
    y="installedcapacity_mwp"
)

peterdudfield commented 1 month ago

yea I agree that whats you get.

But our date is from pulling the data live every 30 minutes. So not pulling it all in one go, but pulling it every 30 minutes in real time Does that make sense why we get that shape?

JamieTaylor-TUOS commented 1 month ago

Ahhhhhh - are you not retrospectively re-downloading after capacity updates? if not then yes what you see is what we would expect

peterdudfield commented 1 month ago

Ahhhhhh - are you not retrospectively re-downloading after capacity updates? if not then yes what you see is what we would expect

Yea, so Im just thinking, is it worth Sheffield Solar, to use a estimate capacity. This could be made by looking at the last quarterly update and then daya by day, estimating how much new solar has gone online. Of course when the new esimate come in it will suddenly jump up (or down), but these jumps could potentially be a lot smaller

JamieTaylor-TUOS commented 1 month ago

We used to apply some forecasting of the PV growth so that the PV capacity changed when viewed in real-time, but it was hopelessly inaccurate as it needed to be geographically resolved (by GSP) to ensure our regional capacities were consistent with our national capacity. Instead we opted for a retrospective correction approach as per this.

If the jumps are creating an issue in your training, it might be better to train the model to predict yield (MW generation per MWp installed capacity). This won't show the step behaviour because the yield we estimate before and after a capacity update will be consistent

JamieTaylor-TUOS commented 1 month ago

That is to say - forecasting national PV growth for 3-6 months right now would be fairly straightforward, but forecasting where in the country it would be installed is rather more difficult. If we apply a correction to the national outturn estimate, we'd want to apply the same correction to the GSP outturns, but the latter can lead to huge errors (50MWp solar farm installed in one GSP vs 12000 domestic systems installed across GB)

peterdudfield commented 1 month ago

Thanks @JamieTaylor-TUOS that makes total sense

SheffieldSolar / PV_Live-API

Estimate of capaicty #36