Missing details in PSMv3 5min documentation

williamhobbs commented 2 years ago

On the PSMv3 5min documentation website, https://developer.nrel.gov/docs/solar/nsrdb/psm3-5min-download/, https://github.com/NREL/developer.nrel.gov/blob/main/source/docs/solar/nsrdb/psm3-5min-download.html.md.erb, there appear to be a few issues:

The second sentence says "The National Solar Radiation Database (NSRDB) is a serially complete collection of hourly and half-hourly values..." and does not reference 5- and 15-min values.
In the request parameters table, for the interval parameter, the description does not mention 5- and 15-min interval options
I can't find a description of how the 15-, 30-, and 60-min interval values are derived. My recollection is that the Oct 2020 webinar covered this and explained that they are averages of 5-min interval values, but I think more formal and accessible documentation is needed.
Similarly for the standard PSMv3 data, are the 60-min interval values the average of two 30-min values, or just "every-other" 30-min value?
Referencing the area of each grid cell in the 5min PSMv3 documentation (2x2 km) and standard PSMv3 API documentation (4x4 km?) would be helpful (this isn't necessarily an issue, just a suggestion).

Thanks!

reger commented 2 years ago

for you @PjEdwards

PjEdwards commented 2 years ago

Hi @williamhobbs. Thanks for the notes! I'll add these as tickets to get our docs updated.

In all cases where we down-select time interval when downloading a dataset all we do is grab every x slice. So a 15 minute download of the 5 minute data is every 3rd slice, and the 60 minute download of the 30 minute data is every other slice, etc.

kandersolar commented 2 years ago

Is that the case for the 30-minute data as well (that it's a slice, not an average, of lower frequency data)? I had understood the same as @williamhobbs: 30-minute is the average of 5-minute data, for recent years at least:

the 30 minute data is only averaged for 2018 and 2019. For all prior years 30min data was the native satellite resolution (instantaneous). The 5min to 30min averaging is done by averaging 7 5min timesteps in a window centered on the 30min timestep. Data is also aggregated from the nearest 4 spatial pixels to go from 2km to 4km (squared).

Source: https://nsrdb.nrel.gov/images/NSRDB_Webinar_QA_-_Oct_6_2020.pdf

PjEdwards commented 2 years ago

Well, I think we are talking about 2 questions. I'm only speaking about what the download APIs do with the data the NSRDB team hands to me to serve. What you are asking about here sounds like how the NSRDB model generates the 30m data from src. I'm afraid I'm not the right person to ask about that. Suggest pinging the mailing list at nsrdb@nrel.gov.

williamhobbs commented 2 years ago

@kanderso-nrel referenced the same source I was looking at. And I agree, I think we are talking about two different things.

So, does the NSRDB team need to provide clarification about how the NSRDB produces the underlying data, and then the API documentation can be updated accordingly?

wholmgren commented 2 years ago

So, does the NSRDB team need to provide clarification about how the NSRDB produces the underlying data, and then the API documentation can be updated accordingly?

Yes please!

williamhobbs commented 2 years ago

Can I "@" the NSRDB team here, or does that need to go offline via email and then come back here later?

PjEdwards commented 2 years ago

I am not sure if any of those folks are on here. I'd suggest sending a link to this issue in an email to that list and maybe one of them will jump on here and answer inline? I can ask as well and maybe get some traction.

williamhobbs commented 2 years ago

I'll send an email to Managit linking this issue. Thanks!

grantbuster commented 2 years ago

Here is my github username, feel free to ping me via github in the future. Here is the response i provided via email (for posterity). It sounds like there might be some confusion on the source data vs. the API doing an interval selection when serving the data... I can't really comment on this though. @PjEdwards is still the authority on the API.

In the continental united states we have data at 5min 2km starting in 2018, so if you retrieve the 30min 4km data from 2018 onwards, every datum is an average of the 4x closest 2km grid cells and the 7 closest timesteps (odd number to make a centered window average) at these grid cells. So every 1x 4km 30min datum is an average of 28x 2km 5min data. The irradiance is a straight mean value, other datasets like cloud type have more complicated methods like the mode.

williamhobbs commented 2 years ago

For the cases of 30-min and 60min interval data from 2018 forward, does pulling from "PSM v3" vs "PSM v3 5 minute" make a difference in terms of sampling/averaging of either timesteps or grid cells? It seems that the standard "PSM v3" need to maintain consistency before and after 2018 so that the multi-year dataset, and derivatives like TMY-type files, stays consistent over the full time range.

Additional detail on what I'm referring to: When downloading data via the developer API or the map NSRDB Data Viewer, 30-min and 60-min interval data can come from either dataset. Here https://developer.nrel.gov/docs/solar/nsrdb/psm3-download/ vs here https://developer.nrel.gov/docs/solar/nsrdb/psm3-5min-download/ for the API, see screenshots below for the Data Viewer.

grantbuster commented 2 years ago

For the cases of 30-min and 60min interval data from 2018 forward, does pulling from "PSM v3" vs "PSM v3 5 minute" make a difference in terms of sampling/averaging of either timesteps or grid cells?

I think that pulling "PSM v3" is the data aggregated from 2km5min->4km30min and "PSM v3 5-minute" is the native 2km5min data, possibly sampled at a coarser temporal resolution. @PjEdwards can you confirm this?

It seems that the standard "PSM v3" need to maintain consistency before and after 2018 so that the multi-year dataset, and derivatives like TMY-type files, stays consistent over the full time range.

YES. That was our goal but obviously we've walking into a whole mess of confusion. In the future we're going to try and be more descriptive with the data versioning and the spatiotemporal extent names.

williamhobbs commented 2 years ago

A related detail that I don't think has been covered here (or in API documentation) is clarifying when the timestamps are center-labeled and values are averages vs when timestamps and values are instantaneous.

NREL / developer.nrel.gov

Missing details in PSMv3 5min documentation #246