blaylockbk / Herbie

Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.
https://herbie.readthedocs.io/
MIT License
492 stars 74 forks source link

HRRR as Zarr on AWS #2

Closed rsignell-usgs closed 3 years ago

rsignell-usgs commented 4 years ago

@blaylockbk , this is probably the wrong place to raise this, but I saw in your HRRR Archive FAQ, you said:

One day, we hope this data will be archived elsewhere that is more accessible to everyone. Perhaps soon it will be hosted by Amazon by their Opendata initiative. I would advocate to keep it in the GRIB2 format (the original format it is output as), but it would also be nice to store the data in a "cloud-friendly" format such as zarr.

To have archived HRRR data in Zarr would be AMAZING. We were trying to figure out how to download 1 year of HRRR surface fields to drive a Delaware Bay hydrodynamics simulation, and thinking how useful it would be to have the data on AWS. We could store as Zarr but create GRIB-on-demand service for those who need it. I've been active on the Pangeo project, and we have some tools now that could make the conversion, chunking and upload to cloud much easier. And I'd be happy to help out.

@zflamig, you guys would be up for a proposal on this, right ?

blaylockbk commented 4 years ago

Hi! Not the wrong place to ask. I'm no longer at the University of Utah working on this project, but yes, that is the plan! And I'm excited to see it happen. As far as I understand, the work is currently underway and a big chunk has already been moved to AWS. @zflamig @johnhorel @mesowx would know more than I do about that effort.

johnhorel commented 4 years ago

Yep, several months in zarr format are available already in AWS behind the curtain. We are grateful to @zflamig and the AWS Open Data Program to make that possible. We'll let you know as soon as the project goes live. Taylor Gowan is working on example codes for people to start from. Here's a taste to show the 96 chunks. Dang- you won't be in just one chunk, but stitching a few together is not a big deal. image

rsignell-usgs commented 4 years ago

@johnhorel , that's super cool! You mean those are the chunks within the single Zarr dataset, right?

johnhorel commented 4 years ago

Yes, there is a directory structure to tunnel down until you get the specific field/vertical level/YYMMDDHH but then in one directory there are 96 chunks within which are the all forecasts for that model run so it is easier to do data analytics. It will be straightforward to grab what you need with the understanding of how complex NCEP has the naming convention for types of levels, variables, etc.

rsignell-usgs commented 3 years ago

@johnhorel, I'm trying to create a "best time series" from HRRR for the purpose of providing boundary conditions for coastal ocean models. I downloaded all the F01 hour data from AWS in grib format for 2019 as a test, but found there were 42 missing files:

fmissing = ['noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t02z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t03z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t04z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t05z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t06z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t07z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t08z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t09z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t10z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t11z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t12z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t13z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t14z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t15z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t16z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t17z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t18z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t19z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t20z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t21z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t22z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t23z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t13z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t14z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t15z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t16z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190503/conus/hrrr.t23z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t10z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t11z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t12z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t13z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t14z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t18z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t19z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t20z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t21z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t22z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t23z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t08z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t09z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t17z.wrfsfcf01.grib2',
 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t18z.wrfsfcf01.grib2']

Can you confirm that those are indeed missing? If so, what actions would you recommend to create a best time series without gaps?

johnhorel commented 3 years ago

Rich-

Thanks, not too surprised as there are plenty of ways real-time access can get mucked up. The main thing in your favor is all the data are on both google and AWS now. You should be able to fill in from those sources and plan to use them for other years. Their archives are complete.

NOAA page link: https://registry.opendata.aws/noaa-hrrr-pds/

https://console.cloud.google.com/storage/browser/high-resolution-rapid-refresh?project=gcp-public-data-weather

We are working to get the zarr archive which will be our focus public on AWS in the next few weeks. More details on that to follow on the hrrr.chpc.utah.edu page.

Regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: Rich Signell notifications@github.com Sent: Monday, December 21, 2020 8:32 AM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

@johnhorelhttps://github.com/johnhorel, I'm trying to create a "best time series" from HRRR for the purpose of providing boundary conditions for coastal ocean models. I downloaded all the F01 hour data from AWS in grib format for 2019 as a test, but found there were 42 missing files:

fmissing = ['noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t02z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t03z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t04z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t05z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t06z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t07z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190503/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t18z.wrfsfcf01.grib2']

Can you confirm that those are indeed missing? If so, what actions would you recommend to create a best time series without gaps?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-749034459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHVZHT6HOB3JLS44HICDSV5TCRANCNFSM4RSHIF6A.

johnhorel commented 3 years ago

Oh my bad. I didn't pay attention that you were already getting it from AWS. Try google? Otherwise, there are glitches where the model runs just may not be available. We're interested in these outages too for documenting gaps in the zarr archive but just haven't got there yet.

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: John Horel Sent: Monday, December 21, 2020 8:45:05 AM To: blaylockbk/HRRR_archive_download; blaylockbk/HRRR_archive_download Cc: Mention; ZACHARY RYAN RIECK; Alexander Jacques Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

Rich-

Thanks, not too surprised as there are plenty of ways real-time access can get mucked up. The main thing in your favor is all the data are on both google and AWS now. You should be able to fill in from those sources and plan to use them for other years. Their archives are complete.

NOAA page link: https://registry.opendata.aws/noaa-hrrr-pds/

https://console.cloud.google.com/storage/browser/high-resolution-rapid-refresh?project=gcp-public-data-weather

We are working to get the zarr archive which will be our focus public on AWS in the next few weeks. More details on that to follow on the hrrr.chpc.utah.edu page.

Regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: Rich Signell notifications@github.com Sent: Monday, December 21, 2020 8:32 AM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

@johnhorelhttps://github.com/johnhorel, I'm trying to create a "best time series" from HRRR for the purpose of providing boundary conditions for coastal ocean models. I downloaded all the F01 hour data from AWS in grib format for 2019 as a test, but found there were 42 missing files:

fmissing = ['noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t02z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t03z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t04z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t05z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t06z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t07z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190503/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t18z.wrfsfcf01.grib2']

Can you confirm that those are indeed missing? If so, what actions would you recommend to create a best time series without gaps?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-749034459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHVZHT6HOB3JLS44HICDSV5TCRANCNFSM4RSHIF6A.

johnhorel commented 3 years ago

Oh my bad. I didn't pay attention that you were already retrieving the files from AWS. Try google? We're interested in these outages too for documenting gaps in the zarr archive but just haven't got there yet.

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: John Horel Sent: Monday, December 21, 2020 8:45:05 AM To: blaylockbk/HRRR_archive_download; blaylockbk/HRRR_archive_download Cc: Mention; ZACHARY RYAN RIECK; Alexander Jacques Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

Rich-

Thanks, not too surprised as there are plenty of ways real-time access can get mucked up. The main thing in your favor is all the data are on both google and AWS now. You should be able to fill in from those sources and plan to use them for other years. Their archives are complete.

NOAA page link: https://registry.opendata.aws/noaa-hrrr-pds/

https://console.cloud.google.com/storage/browser/high-resolution-rapid-refresh?project=gcp-public-data-weather

We are working to get the zarr archive which will be our focus public on AWS in the next few weeks. More details on that to follow on the hrrr.chpc.utah.edu page.

Regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: Rich Signell notifications@github.com Sent: Monday, December 21, 2020 8:32 AM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

@johnhorelhttps://github.com/johnhorel, I'm trying to create a "best time series" from HRRR for the purpose of providing boundary conditions for coastal ocean models. I downloaded all the F01 hour data from AWS in grib format for 2019 as a test, but found there were 42 missing files:

fmissing = ['noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t02z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t03z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t04z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t05z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t06z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t07z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190310/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t15z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190311/conus/hrrr.t16z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190503/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t10z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t11z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t12z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t13z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t14z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t18z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t19z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t20z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t21z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t22z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190504/conus/hrrr.t23z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t08z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20190906/conus/hrrr.t09z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t17z.wrfsfcf01.grib2', 'noaa-hrrr-bdp-pds/hrrr.20191122/conus/hrrr.t18z.wrfsfcf01.grib2']

Can you confirm that those are indeed missing? If so, what actions would you recommend to create a best time series without gaps?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-749034459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHVZHT6HOB3JLS44HICDSV5TCRANCNFSM4RSHIF6A.

rsignell-usgs commented 3 years ago

@johnhorel , I checked Google Cloud, and the same 42 files seem to be missing. 😞

ktyle commented 3 years ago

Wonder if these gaps correspond to instances where the NCEP model production suite had problems that forced runs to be scrubbed.

rsignell-usgs commented 3 years ago

@ktyle, yipes, it didn't occur to me that these might be actual gaps in the forecast -- I assumed they were just files that failed to transfer at some point in the workflow.

@johnhorel , if it's not going to be possible to recover these files, could you please let me know so I can develop a workaround filling in with other forecast hours from the last good forecasts before the gaps?

rsignell-usgs commented 3 years ago

@johnhorel, I wrote a script to fill the gaps with the best available data from previous long forecasts (from the forecasts at 0, 6, 12, 18 hours).

But if the missing original grib files are actually available, I'll use those instead!

johnhorel commented 3 years ago

Sorry for the hassle Rich but sounds like a good plan.

regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: Rich Signell notifications@github.com Sent: Wednesday, January 6, 2021 12:03:13 PM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

@johnhorelhttps://github.com/johnhorel, I wrote a script to fill the gaps with the best available previous long forecast (from the forecasts at 0, 6, 12, 18 hours). If the missing original grib files are actually available, I'll use those instead!

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-755526401, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHV2O3OAJIF7HZPJPKNTSYSXXDANCNFSM4RSHIF6A.

rustyconover commented 3 years ago

Is there a way to track the availability in Zarr format? Or see some examples of the conversion?

I'm interested in applying Uber's H3 library to downsample the forecasts and allow accessibility for wider geographic areas without having to download and interpolate the full resolution forecast products.

johnhorel commented 3 years ago

Rusty-

We're in the process of getting the docs in place with examples. Thanks for your interest. It might be worth a quick call if you'd like as we'd like to build up some use cases.

Regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

john.horel@utah.edu

cell: (801) 870-9450

office: (801) 581-7091


From: Rusty Conover notifications@github.com Sent: Thursday, January 14, 2021 8:46:14 AM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

Is there a way to track the availability in Zarr format? Or see some examples of the conversion?

I'm interested in applying Uber's H3 library to downsample the forecasts and allow accessibility for wider geographic areas without having to download and interpolate the full resolution forecast products.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-760280729, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHVZJICMYTMKR7DOSJNTSZ4GUNANCNFSM4RSHIF6A.

rustyconover commented 3 years ago

@johnhorel Just sent you a personal email.

blaylockbk commented 3 years ago

AWS announced HRRR as a new public dataset on January 14, 2021: https://aws.amazon.com/about-aws/whats-new/2021/01/new-aws-public-datasets-available/

GRIB2 bucket explorer: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html Zarr bucket explorer: https://hrrrzarr.s3.amazonaws.com/index.html

johnhorel commented 3 years ago

Yep we still have bunches to do. Hoping most people will ignore our zarr archive for a couple more weeks. John

John Horel Professor, Chair Department of Atmospheric Sciences University of Utah Cell 801 870-9450 john.horel@utah.edu


From: Brian Blaylock notifications@github.com Sent: Friday, January 15, 2021 2:31:16 PM To: blaylockbk/HRRR_archive_download HRRR_archive_download@noreply.github.com Cc: John Horel john.horel@utah.edu; Mention mention@noreply.github.com Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

AWS announced HRRR as a new public dataset on January 14, 2021: https://aws.amazon.com/about-aws/whats-new/2021/01/new-aws-public-datasets-available/

GRIB2 bucket explorer: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/index.html Zarr bucket explorer: https://hrrrzarr.s3.amazonaws.com/index.html

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-761209055, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHV2ZMC4WONHMVVYNHM3S2CX2JANCNFSM4RSHIF6A.

rsignell-usgs commented 3 years ago

@johnhorel, I finished my processing of the HRRR 2019 data. I created a "best time series" with my variables of interest as a single Zarr dataset. I chunked the (time=8760, y=1059, x=1799) arrays into (time=144, y=300, x=300) chunks (50MB).

This allows for reasonable access times whether acquiring the entire US at a single time step, or obtaining the entire archive time series at a specified location.

Here's a few screengrabs from this example analysis/visualization notebook:

2021-01-20_10-29-30 2021-01-20_10-30-23 2021-01-20_10-30-51

To create this single cloud-optimized Zarr dataset from the initial GRIB2 files, I used basically three steps, captured in these notebooks:

  1. Download
  2. Fill Gaps
  3. Rechunk

My documentation is a bit light but I'd be happy to discuss further here if there is interest.

Ping @ktyle , @abarciauskas-bgse, @zflamig, @ocefpaf

rsignell-usgs commented 3 years ago

@taylorgowan, I'm hoping this example of whole domain as Zarr is useful. Shout if you would like something different!

johnhorel commented 3 years ago

Rich-

Yes, was discussing with the group that we're missing some aspects of what has and is being done. Your comments yesterday were very pertinent.

Regards

john


Please do not feel obligated to respond to this email outside of your normal working hours


John Horel

Professor, Chair

Department of Atmospheric Sciences

University of Utah

@.***

cell: (801) 870-9450

office: (801) 581-7091


From: Rich Signell @.***> Sent: Tuesday, April 27, 2021 12:27:41 PM To: blaylockbk/HRRR_archive_download Cc: John Horel; Mention Subject: Re: [blaylockbk/HRRR_archive_download] HRRR as Zarr on AWS (#2)

ping @taylorgowanhttps://github.com/taylorgowan, for an example of whole domain as Zarr, hoping it's useful. Ping me if you would like something different!

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/blaylockbk/HRRR_archive_download/issues/2#issuecomment-827822599, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUTHV6Q6NZDKV7QQ5UUOKLTK36Z3ANCNFSM4RSHIF6A.