Open blaylockbk opened 3 years ago
I don't think depending on wgrib2 is necessarily a bad thing. What happens if you want to open a subset of a file that doesn't have an index file associated with it? I would think generating one with wgrib2 would be the easiest way.
Yep, I have thought about building in the ability to create a local index file if one is not found on the network and if wgrib2 is installed. This is a good idea.
It would be very useful to create the local index file using wgrib2. Currently, I'm trying to download RAP historical data - one of those weird years where the index files aren't present. The download (of a subset, which is what I need) is failing due to the lack of index file. Example of a download that's failing:
from herbie.archive import Herbie
from datetime import datetime
D = datetime(2017,3,22,6)
H = Herbie(D, model='rap_historical', product='analysis', save_dir='/tmp/')
datasets = H.xarray("((HGT|UGRD|VGRD):(500|300) mb)|((HGT|UGRD|VGRD|RH|VVEL):(850|700) mb)|((UGRD|VGRD|TMP):(925) mb)|(CAPE:surface)|((MSLMA):mean sea level)")
AssertionError Traceback (most recent call last)
/tmp/ipykernel_246537/3588827135.py in <module>
----> 1 datasets = H.xarray("((HGT|UGRD|VGRD):(500|300) mb)|((HGT|UGRD|VGRD|RH|VVEL):(850|700) mb)|((UGRD|VGRD|TMP):(925) mb)|(CAPE:surface)|((MSLMA):mean sea level)")
2 ds = xr.merge(datasets)
3 ds
~/.conda/envs/radar/lib/python3.9/site-packages/herbie/archive.py in xarray(self, searchString, backend_kwargs, remove_grib, **download_kwargs)
890
891 # Download file if local file does not exists
--> 892 local_file = self.get_localFilePath(searchString=searchString)
893
894 # Only remove grib if it didn't exists before we download it
~/.conda/envs/radar/lib/python3.9/site-packages/herbie/archive.py in get_localFilePath(self, searchString)
522 if searchString is not None:
523 # Reassign the index DataFrame with the requested searchString
--> 524 self.idx_df = self.read_idx(searchString)
525
526 # Get a list of all GRIB message numbers. We will use this
~/.conda/envs/radar/lib/python3.9/site-packages/herbie/archive.py in read_idx(self, searchString)
558 A Pandas DataFrame of the index file.
559 """
--> 560 assert self.idx is not None, f"No index file found for {self.grib}."
561
562 if self.IDX_STYLE == "wgrib2":
AssertionError: No index file found for https://www.ncei.noaa.gov/data/rapid-refresh/access/historical/analysis/201703/20170322/rap_130_20170322_0600_000.grb2.
Hi @karlwx, I'll have to find some time to look into this.
One thing to remember: wgrib2 can only make an index of a local file. If there isn't an index file on the remote, we need a local copy of the file to make the index.
In other words, this doesn't work:
wgrib2 https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20220711/conus/hrrr.t00z.wrfnatf00.grib2
*** FATAL ERROR: missing input file https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20220711/conus/hrrr.t00z.wrfnatf00.grib2 ***
The current main branch of Herbie does make index files "on the fly" if there is a local copy of the full file. In your case, you would need to download the full file first...
from herbie.archive import Herbie
from datetime import datetime
D = datetime(2017,3,22,6)
H = Herbie(D, model='rap_historical', product='analysis', save_dir='/tmp/')
H.download() # Since an index file doesn't exist, we need to download the full file.
# When calling the xarray method, an index file will be made on the fly and used to search the full file (but not saved).
datasets = H.xarray("((HGT|UGRD|VGRD):(500|300) mb)|((HGT|UGRD|VGRD|RH|VVEL):(850|700) mb)|((UGRD|VGRD|TMP):(925) mb)|(CAPE:surface)|((MSLMA):mean sea level)")
It is inconvenient that we need to download the full grib file when there isn't an index file, but I haven't figured out a way to get around this.
Thanks @blaylockbk, I appreciate it! In the mean time, I'll add some logic to simply download the full file if the index file isn't present.
FYI, the hour 0 grib files from this time period seem to be corrupted, in addition to having missing index files. When I download the whole file and try to run wgrib2 on it, I get this error:
198:10245639:d=2017032206:TMP:surface:anl:
pdt_len: bad stat_proc ranges = 0 set to to 1
** ERROR bad grib message: Statistical Processing bad n=0 **
199:10346709:d=2017032206:ASNOW:surface::
*** FATAL ERROR (delayed): forecast time for /tmp/rap_historical/20170322/rap_130_20170322_0600_000.grb2
The hour 1 files are fine (with idx present on NCEI) so it could be an enhancement to look for the hour 1 file valid at the same time if the hour 0 file is bad.
This is interesting: https://forum.mmm.ucar.edu/threads/why-do-i-see-fatal-error-statistical-processing-bad-n-0-when-using-the-wgrib2-utility-on-my-upp-output.9214/
This error message is displayed when using more recent versions of the wgrib2 utility on files for forecast zero that contain accumulated or time-averaged fields. This is due to the newer versions of wgrib2 no longer allowing for n parameter to be zero or empty.
Users can also continue to use an older version of wgrib2, with the latest known version not resulting in this error being v2.0.4.
@dplarson
(issue brought up by @danieldjewell)
If you have downloaded the full GRIB2 and then run the
.xarray('HGT')
method to open a subset, it appears that Herbie won't subset the file that was already downloaded, but will instead redownload the requested subset. In other words, instead of opening the full GRIB2 that was already downloaded and grabbing all of the "HGT" records, it'll redownload the GRIB2 "HGT" records from the remote source URL.See...the print message says the file is being redownloaded from the remote!
This may be harder to implement cleanly than I thought, because reading a subset of the downloaded data would make a copy of the full dataset. That makes redundant data copies on your local machine. I wonder how I could read the subset from the full local file based on the searchString argument without depending on wgrib2.