Investigate: Learn about Kerchunk and figure out if Herbie could benefit

Kerchunk has the ability to "scan" a file and determine the byte ranges for each message without downloading the file.

https://github.com/fsspec/kerchunk/blob/f272c66c9ea23dd8aaef885f9b6fb1c1312d3813/kerchunk/grib2.py#L91
https://nbviewer.org/gist/peterm790/92eb1df3d58ba41d3411f8a840be2452

I was able to get the byte ranges for a GRIB2 file without downloading the file

import fsspec
from kerchunk.grib2 import scan_grib
import pandas as pd

afilter = {"typeOfLevel": "heightAboveGround", "level": [2, 10]}
so = {"anon": True}

idx = scan_grib(
    "s3://noaa-hrrr-bdp-pds/hrrr.20230630/conus/hrrr.t00z.wrfsfcf01.grib2",
    storage_options=so,
    # filter=afilter,
)

df = pd.DataFrame(
    [i["refs"]["latitude/0.0"][1:] for i in idx], columns=["startByte", "bytes"]
)
df["endByte"] = df.bytes.cumsum()
df["varName"] = [list(i["refs"].keys())[3].split("/")[0] for i in idx]
df

However, scanning the full file took ~4 minutes! If you use the filters you can cut this down to about 20 seconds.

As far as I can tell, it uses eccodes, so the naming convention doesn't follow wgrib2, which Herbie users are familiar with.

blaylockbk / Herbie

Investigate: Learn about Kerchunk and figure out if Herbie could benefit #147