SheffieldSolar / PV_Live-API

A Python implementation of the PV_Live web API.
16 stars 4 forks source link

When is the "updated" PV_Live made available? #20

Closed JackKelly closed 1 year ago

JackKelly commented 1 year ago

Thanks so much for all your work on this @JamieTaylor-TUOS and @ejones18!

Please may I ask: What time is the "updated" PV_Live available via the API? Is it 10:30am UTC?

(I've found that, for some reason, Open Climate Fix's code successfully pulls the "updated" PV_Live during wintertime, but I think OCF's code mistakenly gets the "intraday" PV_Live data during British Summer Time instead of the "updated" PV_Live)

peterdudfield commented 1 year ago

Just to through out my understanding, but @JamieTaylor-TUOS and @ejones18 please correct.

Intraday

+1

Further refinement is made at 22.00 but normally the results dont change to much from earlier that day

JamieTaylor-TUOS commented 1 year ago

Hi @JackKelly,

@peterdudfield is right... for completeness (and the benefit of anyone else who stumbles across this issue) here's the response I typically give to other PV_Live users:

In general, we tell users to expect continuous retrospective updates to PV_Live outturn estimates - we regularly re-calculate them retrospectively and these updated estimates are reflected immediately in the data delivered via the API.

As of 2023-05-02, the first estimate of the PV outturn for a given settlement period is computed ~5 minutes after the end of the half hour in question and is typically available via the API within 6 minutes

e.g. the initial estimate of the GB PV outturn for the period 14:30 - 15:00 UTC today (i.e. 15:30 - 16:00 BST) will become available at ~16:06 BST today and will be labelled using the timestamp at the end of the interval (in UTC): '2023-05-02 15:00:00'

This outturn estimate is not final though - we will continue to make retrospective revisions as

Since the PV_Live model produces outturn estimates, they will never strictly speaking be final, as there will always be things we can do to refine the model and improve accuracy. That said, there are some notable/routine retrospective revisions to be aware of:

In order to maintain a local copy of the PV_Live GB national outturn estimates that is as in sync with our own latest/best estimates as possible, we recommend the following polling cycle:

peterdudfield commented 1 year ago

Thanks @JamieTaylor-TUOS

Is it worth adding this to the Readme.md? on this repo?

JackKelly commented 1 year ago

Thanks @JamieTaylor-TUOS - this is super-useful!

Please may I make a request: Would it be at all possible to add a column to your DB (and to the data returned from the API) to tell users when each value was last updated? e.g. a "datetime_of_computation" column? (This would be a little like the "initialisation time" dimension that's in numerical weather predictions.)

The reason I ask is because, at inference time, OCF's PV forecasting ML models need to know whether they're being given an "initial estimate" or one of Sheffield's "updated estimates". If the ML model thinks it's seeing an "updated estimate" but is actually seeing an "initial estimate" then the model will produce the wrong output because the ML model will put "too much faith" in the estimate :slightly_smiling_face: . (If that makes sense?)

JamieTaylor-TUOS commented 1 year ago

Thanks @JamieTaylor-TUOS - this is super-useful!

Please may I make a request: Would it be at all possible to add a column to your DB (and to the data returned from the API) to tell users when each value was last updated? e.g. a "datetime_of_computation" column? (This would be a little like the "initialisation time" dimension that's in numerical weather predictions.)

The reason I ask is because, at inference time, OCF's PV forecasting ML models need to know whether they're being given an "initial estimate" or one of Sheffield's "updated estimates". If the ML model thinks it's seeing an "updated estimate" but is actually seeing an "initial estimate" then the model will produce the wrong output because the ML model will put "too much faith" in the estimate 🙂 . (If that makes sense?)

@JackKelly This makes sense. We do have such a field in the DB already, but its not currently exposed as an "extra_field" in the PV_Live API. I see no reason why it can't be though. Indeed, if you wanted to store every revision of the PV_Live outturn estimates in your own DB you could include this field in the PK.

@JulianBriggs please could you add the updated_gmt field to the list of extra_fields that are supported by the PV_Live API and update this section in the docs accordingly?

e.g. the following query:

https://api.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt

should produce a response like:

{"data":[[0,"2023-05-02T15:30:00Z",4138.69,"2023-05-02T15:50:26Z"]],"meta":["gsp_id","datetime_gmt","generation_mw","updated_gmt"]}

(N.B. this request currently produces a 400 - Bad Request as per the PV_Live API documentation)

JackKelly commented 1 year ago

Awesome, thank you @JamieTaylor-TUOS & @JulianBriggs!

JulianBriggs commented 1 year ago

Hi Jamie, I have implemented this on ssfweb2: https://staging.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt

Where the response is an aggregation is is possible (though unlikely) that there may be several values of update_gmt, eg: https://staging.solar.sheffield.ac.uk/pvlive/api/v4/pes/21?extra_fields=updated_gmt (Most aggrations use SUM to aggregate, some use AVG. I have used MIN for updated_gmt.)

Pls confirm that this works as expected. Thanks kr J

On Tue, 2 May 2023 at 17:03, Jamie Taylor @.***> wrote:

Thanks @JamieTaylor-TUOS https://github.com/JamieTaylor-TUOS - this is super-useful!

Please may I make a request: Would it be at all possible to add a column to your DB (and to the data returned from the API) to tell users when each value was last updated? e.g. a "datetime_of_computation" column? (This would be a little like the "initialisation time" dimension that's in numerical weather predictions.)

The reason I ask is because, at inference time, OCF's PV forecasting ML models need to know whether they're being given an "initial estimate" or one of Sheffield's "updated estimates". If the ML model thinks it's seeing an "updated estimate" but is actually seeing an "initial estimate" then the model will produce the wrong output because the ML model will put "too much faith" in the estimate 🙂 . (If that makes sense?)

@JackKelly https://github.com/JackKelly This makes sense. We do have such a field in the DB already, but its not currently exposed as an "extra_field" in the PV_Live API. I see no reason why it can't be though. Indeed, if you wanted to store every revision of the PV_Live outturn estimates in your own DB you could include this field in the PK.

@JulianBriggs https://github.com/JulianBriggs please could you add the updated_gmt field to the list of extra_fields that are supported by the PV_Live API and update this section https://docs.google.com/document/d/e/2PACX-1vSDFb-6dJ2kIFZnsl-pBQvcH4inNQCA4lYL9cwo80bEHQeTK8fONLOgDf6Wm4ze_fxonqK3EVBVoAIz/pub#h.s63lajskj3kn:~:text=Default%20is%2030.-,Extra%20fields,-All%20lower%20case in the docs accordingly?

e.g. the following query:

https://api.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt

should produce a response like:

{"data":[[0,"2023-05-02T15:30:00Z",4138.69,"2023-05-02T15:50:26Z"]],"meta":["gsp_id","datetime_gmt","generation_mw","updated_gmt"]}

(N.B. this request current produces a 400 - Bad Request as per the PV_Live API documentation)

— Reply to this email directly, view it on GitHub https://github.com/SheffieldSolar/PV_Live-API/issues/20#issuecomment-1531731735, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAUAOOSGOL56WRU6476XY3XEEV3VANCNFSM6AAAAAAXPE3UVI . You are receiving this because you were mentioned.Message ID: @.***>

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

JamieTaylor-TUOS commented 1 year ago

Thanks @JulianBriggs.

There is an issue with the format used for the updated_gmt field - please could you return in ISO8601 format as per the datetime_gmt field?

Please also use MAX as the agg function for updated_gmt when returning aggregate data at GSP/PES level.

JulianBriggs commented 1 year ago

Hi Jamie,

Have fixed the format. Could you explain why you prefer MAX to MIN for the agg function? My thinking is that the reliable updated_gmt timestamp of an aggregated value is the earliest timestamp of the values it is based on, not the latest.

Pls check and confirm https://staging.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt Thanks kr J

On Wed, 3 May 2023 at 11:50, Jamie Taylor @.***> wrote:

Thanks @JulianBriggs https://github.com/JulianBriggs.

There is an issue with the format used for the updated_gmt field - please could you return in ISO8601 format as per the datetime_gmt field?

Please also use MAX as the agg function for updated_gmt when returning aggregate data at GSP/PES level.

— Reply to this email directly, view it on GitHub https://github.com/SheffieldSolar/PV_Live-API/issues/20#issuecomment-1532822926, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAUAOK2SF5DDWCS6C6H5VLXEI2AXANCNFSM6AAAAAAXPE3UVI . You are receiving this because you were mentioned.Message ID: @.***>

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

JamieTaylor-TUOS commented 1 year ago

@JulianBriggs We want the updated_gmt field to mark the time of the latest change to the outturn estimate. In the case of PES outturn estimates, any change to of one of the constituent GSP outturns would also represent a change to the PES outturn, hence we should use MAX.

The format looks good now 👍

JulianBriggs commented 1 year ago

OK Have deployed to production API (ssfdb3). kr J

On Wed, 3 May 2023 at 12:34, Jamie Taylor @.***> wrote:

@JulianBriggs https://github.com/JulianBriggs We want the updated_gmt field to mark the time of the latest change to the outturn estimate. In the case of PES outturn estimates, any change to the outturn of one of the constituent GSP outturns would also represent a change to the PES outturn, hence we should use MAX.

The format looks good now 👍

— Reply to this email directly, view it on GitHub https://github.com/SheffieldSolar/PV_Live-API/issues/20#issuecomment-1532872121, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAUAOJE3SD72OPLTRXD6FLXEI7C7ANCNFSM6AAAAAAXPE3UVI . You are receiving this because you were mentioned.Message ID: @.***>

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

JulianBriggs commented 1 year ago

correction will deploy to production after dark this evening. kr J

On Wed, 3 May 2023 at 12:39, Julian Briggs @.***> wrote:

OK Have deployed to production API (ssfdb3). kr J

On Wed, 3 May 2023 at 12:34, Jamie Taylor @.***> wrote:

@JulianBriggs https://github.com/JulianBriggs We want the updated_gmt field to mark the time of the latest change to the outturn estimate. In the case of PES outturn estimates, any change to the outturn of one of the constituent GSP outturns would also represent a change to the PES outturn, hence we should use MAX.

The format looks good now 👍

— Reply to this email directly, view it on GitHub https://github.com/SheffieldSolar/PV_Live-API/issues/20#issuecomment-1532872121, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAUAOJE3SD72OPLTRXD6FLXEI7C7ANCNFSM6AAAAAAXPE3UVI . You are receiving this because you were mentioned.Message ID: @.***>

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

JamieTaylor-TUOS commented 1 year ago

@JackKelly and @peterdudfield...

As per Julians message, the updated_gmt field will be available from the PV_Live API via the extra_fields parameter as of tonight.

I don't anticipate needing to make any changes to the PV_Live-API python library's code base in order for this field to be accessed, since the validation of the extra_fields value(s) is carried out on the API server rather than in the code. I'll test this tonight or tomorrow morning to confirm.

It should be accessible using e.g.

from pvlive_api import PVLive

pvl = PVLive()
pvl.latest(
    entity_type="gsp",
    entity_id=0,
    extra_fields="installedcapacity_mwp,updated_gmt", # For full list of optional fields, see PV_Live API docs
    period=30,
    dataframe=True
)
JackKelly commented 1 year ago

Hi @JulianBriggs & @JamieTaylor-TUOS - wow, thank you so much for implementing this so quickly! You're superstars! Thank you!

JulianBriggs commented 1 year ago

Hi Jamie & Jack, I have added an extra_field option: updated_gmt, eg: https://api.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt r https://api.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt eturned

{"data":[[0,"2023-05-03T19:30:00Z",50.3593,"2023-05-03T19:40:28Z"]],"meta":["gsp_id","datetime_gmt","generation_mw","updated_gmt"]} https://api.solar.sheffield.ac.uk/pvlive/api/v4/gsp/0?extra_fields=updated_gmt

kindest regards

Julian

On Wed, 3 May 2023 at 14:47, Jack Kelly @.***> wrote:

Hi @JulianBriggs https://github.com/JulianBriggs & @JamieTaylor-TUOS https://github.com/JamieTaylor-TUOS - wow, thank you so much for implementing this so quickly! You're superstars! Thank you!

— Reply to this email directly, view it on GitHub https://github.com/SheffieldSolar/PV_Live-API/issues/20#issuecomment-1533060966, or unsubscribe https://github.com/notifications/unsubscribe-auth/APAUAON2Z7UP32U2KPSLT6LXEJOVPANCNFSM6AAAAAAXPE3UVI . You are receiving this because you were mentioned.Message ID: @.***>

-- Julian Briggs Research Software Developer Sheffield Solar Physics & Astronomy University of Sheffield I am currently working remotely. My preferred method of contact is email. My normal working hours are 06:00 - 16:00 Mon-Thu

JamieTaylor-TUOS commented 1 year ago

Thanks @JulianBriggs, I confirm the field can be accessed using this Python library with no modification:

>> docker run -it --rm sheffieldsolar/pv_live-api pv_live --start "2023-05-03 12:30:00" --extra_fields "installedcapacity_mwp,updated_gmt"

    gsp_id              datetime_gmt  generation_mw  installedcapacity_mwp           updated_gmt
0        0 2023-05-03 20:30:00+00:00        0.00000              13861.203  2023-05-03T20:40:26Z
1        0 2023-05-03 20:00:00+00:00        0.08155              13861.203  2023-05-03T20:30:27Z
2        0 2023-05-03 19:30:00+00:00       50.35930              13861.203  2023-05-03T20:00:27Z
3        0 2023-05-03 19:00:00+00:00      332.66300              13861.203  2023-05-03T19:30:26Z
4        0 2023-05-03 18:30:00+00:00      807.11400              13861.203  2023-05-03T19:00:27Z
5        0 2023-05-03 18:00:00+00:00     1437.03000              13861.203  2023-05-03T19:00:27Z
6        0 2023-05-03 17:30:00+00:00     2170.94000              13861.203  2023-05-03T20:30:26Z
7        0 2023-05-03 17:00:00+00:00     3071.08000              13861.203  2023-05-03T20:00:25Z
8        0 2023-05-03 16:30:00+00:00     3944.36000              13861.203  2023-05-03T19:30:25Z
9        0 2023-05-03 16:00:00+00:00     4868.50000              13861.203  2023-05-03T19:00:25Z
10       0 2023-05-03 15:30:00+00:00     5670.17000              13861.203  2023-05-03T18:30:26Z
11       0 2023-05-03 15:00:00+00:00     6440.94000              13861.203  2023-05-03T18:00:25Z
12       0 2023-05-03 14:30:00+00:00     7004.60000              13861.203  2023-05-03T17:30:26Z
13       0 2023-05-03 14:00:00+00:00     7455.22000              13861.203  2023-05-03T17:00:25Z
14       0 2023-05-03 13:30:00+00:00     7911.43000              13861.203  2023-05-03T16:30:25Z
15       0 2023-05-03 13:00:00+00:00     8063.78000              13861.203  2023-05-03T16:00:26Z
16       0 2023-05-03 12:30:00+00:00     8157.73000              13861.203  2023-05-03T15:30:26Z

I did also just release a new version (1.1.0) which adds support for setting the --extra_fields in the CLI and also added some additional error handling when a 400 response is received (e.g. due to bad values set in the extra_fields).

@JackKelly - you're welcome!