Closed williamhobbs closed 2 years ago
for you @PjEdwards
Hi @williamhobbs. Thanks for the notes! I'll add these as tickets to get our docs updated.
In all cases where we down-select time interval when downloading a dataset all we do is grab every x slice. So a 15 minute download of the 5 minute data is every 3rd slice, and the 60 minute download of the 30 minute data is every other slice, etc.
Is that the case for the 30-minute data as well (that it's a slice, not an average, of lower frequency data)? I had understood the same as @williamhobbs: 30-minute is the average of 5-minute data, for recent years at least:
the 30 minute data is only averaged for 2018 and 2019. For all prior years 30min data was the native satellite resolution (instantaneous). The 5min to 30min averaging is done by averaging 7 5min timesteps in a window centered on the 30min timestep. Data is also aggregated from the nearest 4 spatial pixels to go from 2km to 4km (squared).
Source: https://nsrdb.nrel.gov/images/NSRDB_Webinar_QA_-_Oct_6_2020.pdf
Well, I think we are talking about 2 questions. I'm only speaking about what the download APIs do with the data the NSRDB team hands to me to serve. What you are asking about here sounds like how the NSRDB model generates the 30m data from src. I'm afraid I'm not the right person to ask about that. Suggest pinging the mailing list at nsrdb@nrel.gov.
@kanderso-nrel referenced the same source I was looking at. And I agree, I think we are talking about two different things.
So, does the NSRDB team need to provide clarification about how the NSRDB produces the underlying data, and then the API documentation can be updated accordingly?
So, does the NSRDB team need to provide clarification about how the NSRDB produces the underlying data, and then the API documentation can be updated accordingly?
Yes please!
Can I "@" the NSRDB team here, or does that need to go offline via email and then come back here later?
I am not sure if any of those folks are on here. I'd suggest sending a link to this issue in an email to that list and maybe one of them will jump on here and answer inline? I can ask as well and maybe get some traction.
I'll send an email to Managit linking this issue. Thanks!
Here is my github username, feel free to ping me via github in the future. Here is the response i provided via email (for posterity). It sounds like there might be some confusion on the source data vs. the API doing an interval selection when serving the data... I can't really comment on this though. @PjEdwards is still the authority on the API.
In the continental united states we have data at 5min 2km starting in 2018, so if you retrieve the 30min 4km data from 2018 onwards, every datum is an average of the 4x closest 2km grid cells and the 7 closest timesteps (odd number to make a centered window average) at these grid cells. So every 1x 4km 30min datum is an average of 28x 2km 5min data. The irradiance is a straight mean value, other datasets like cloud type have more complicated methods like the mode.
For the cases of 30-min and 60min interval data from 2018 forward, does pulling from "PSM v3" vs "PSM v3 5 minute" make a difference in terms of sampling/averaging of either timesteps or grid cells? It seems that the standard "PSM v3" need to maintain consistency before and after 2018 so that the multi-year dataset, and derivatives like TMY-type files, stays consistent over the full time range.
Additional detail on what I'm referring to: When downloading data via the developer API or the map NSRDB Data Viewer, 30-min and 60-min interval data can come from either dataset. Here https://developer.nrel.gov/docs/solar/nsrdb/psm3-download/ vs here https://developer.nrel.gov/docs/solar/nsrdb/psm3-5min-download/ for the API, see screenshots below for the Data Viewer.
For the cases of 30-min and 60min interval data from 2018 forward, does pulling from "PSM v3" vs "PSM v3 5 minute" make a difference in terms of sampling/averaging of either timesteps or grid cells?
I think that pulling "PSM v3" is the data aggregated from 2km5min->4km30min and "PSM v3 5-minute" is the native 2km5min data, possibly sampled at a coarser temporal resolution. @PjEdwards can you confirm this?
It seems that the standard "PSM v3" need to maintain consistency before and after 2018 so that the multi-year dataset, and derivatives like TMY-type files, stays consistent over the full time range.
YES. That was our goal but obviously we've walking into a whole mess of confusion. In the future we're going to try and be more descriptive with the data versioning and the spatiotemporal extent names.
A related detail that I don't think has been covered here (or in API documentation) is clarifying when the timestamps are center-labeled and values are averages vs when timestamps and values are instantaneous.
On the PSMv3 5min documentation website, https://developer.nrel.gov/docs/solar/nsrdb/psm3-5min-download/, https://github.com/NREL/developer.nrel.gov/blob/main/source/docs/solar/nsrdb/psm3-5min-download.html.md.erb, there appear to be a few issues:
interval
parameter, the description does not mention 5- and 15-min interval optionsThanks!