SheffieldSolar / PV_Live-API

A Python implementation of the PV_Live web API.
16 stars 4 forks source link

Sum(GSP) != National #16

Closed peterdudfield closed 1 year ago

peterdudfield commented 1 year ago

I notices a small behaviour thing that means the sum of the gsp data doesnt add up to the national amonut. I thought it would. It seems to have a bigger effect in the afternoon.

here's some code i tried to see what was happeneing

from pvlive_api import PVLive
from datetime import datetime, timezone
import pandas as pd

dt = datetime(2022,7,26,15,tzinfo=timezone.utc)

pv = PVLive()

national = pv.at_time(dt=dt,entity_type='gsp',entity_id=0,dataframe=True)

gsps = []
for i in range (0,317):
    print(i)
    gsps.append(pv.at_time(dt=dt,entity_type='gsp',entity_id=i+1,dataframe=True))

gsp = pd.concat(gsps)
gsp.fillna(0.0, inplace=True)
print(f"National generation is {national['generation_mw'].iloc[0]}")
print(f"GPS total generation is {gsp['generation_mw'].sum()}")

output

National generation is 4777.02
GPS total generation is 4167.8645557
peterdudfield commented 1 year ago
from pvlive_api import PVLive
from datetime import datetime, timezone
import pandas as pd
import plotly.graph_objects as go

start = datetime(2022,7,26,0,tzinfo=timezone.utc)
end = datetime(2022,7,27,0,tzinfo=timezone.utc)

pv = PVLive()

national = pv.between(start=start,end=end,entity_type='gsp',entity_id=0,dataframe=True)

for i in range (0,317):

    print(i)

    df_on_gsp = pv.between(start=start,end=end,entity_type='gsp',entity_id=i+1,dataframe=True)
    df_on_gsp.rename(columns={'generation_mw':i+1},inplace=True)
    df_on_gsp = df_on_gsp[['datetime_gmt',i+1]]

    if i==0:
        df_all_gsp = df_on_gsp
    else:
        df_all_gsp = df_all_gsp.merge(df_on_gsp, on='datetime_gmt')

df_all_gsp.set_index('datetime_gmt',drop=True, inplace=True)
df_all_gsp['sum'] = df_all_gsp.sum(axis=1)

# plot
trace_1 = go.Scatter(x=df_all_gsp.index, y=df_all_gsp['sum'], name='sum(gsps)')
trace_2 = go.Scatter(x=national['datetime_gmt'], y=national['generation_mw'], name='national')
fig = go.Figure(data=[trace_1,trace_2])
fig.show(renderer='browser')
Screenshot 2022-07-27 at 07 29 58
peterdudfield commented 1 year ago

A short answer from @JamieTaylor-TUOS is that they are made using different methods. The normalized values (MW / installed MWP) do line up better. The gap is due to about 49 GSP whos nans data is Null

JamieTaylor-TUOS commented 1 year ago

Hi Peter, I think this is related to an issue you flagged up (offline) before the upgrade - what happens when there is insufficient sample systems to estimate the outturn for a given GSP?

Previously, you were seeing zeroes, which was not the correct way to handle this and was down to a bug in the code/database/API which lead to NULL/NaN being treated as zero when uploading to the API database.

As part of the recent upgrades, I fixed this, so the NULL/NaN values are now stored and served up correctly by the API.

When running intraday regional PV_Live calculations, the intraday PV sample is much smaller and far more geographically sparse than on day+1, so it's quite common that a whole bunch of GSPs have insufficient sample data and end up having NULL generation - this is what you're seeing. When you sum the GSP generation, it's ignoring NaNs, so the sum of GSP outturns is much lower than the (independently-modelled) national outturn.

image

image

We're looking at ways we can model the GSPs with insufficient sample systems in near-real-time (i.e. intraday), but until then, your best bet is to convert the GSP outturns into yields (i.e. generation per MWp installed).

Alternatively, only use GSP data on day+1 (i.e. once we have incorporated the day+1 sample data). There might still be some GSPs in Scotland with insufficient sample systems though.

I'll leave this issue here until we have a solution in place for modelling all GSPs, even those without sample data.