blaylockbk / Herbie

Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.
MIT License
482 stars 72 forks source link

Investigate: Learn about Kerchunk and figure out if Herbie could benefit #147

Open blaylockbk opened 1 year ago

blaylockbk commented 1 year ago

I keep hearing about kerchunk. I'm certain people smarter than me are working on it and would provide efficient data access. Perhaps Herbie (and GOES-2-go) could benefit from using it.

blaylockbk commented 1 year ago

Kerchunk has the ability to "scan" a file and determine the byte ranges for each message without downloading the file.

I was able to get the byte ranges for a GRIB2 file without downloading the file

import fsspec
from kerchunk.grib2 import scan_grib
import pandas as pd

afilter = {"typeOfLevel": "heightAboveGround", "level": [2, 10]}
so = {"anon": True}

idx = scan_grib(
    # filter=afilter,

df = pd.DataFrame(
    [i["refs"]["latitude/0.0"][1:] for i in idx], columns=["startByte", "bytes"]
df["endByte"] = df.bytes.cumsum()
df["varName"] = [list(i["refs"].keys())[3].split("/")[0] for i in idx]


However, scanning the full file took ~4 minutes! If you use the filters you can cut this down to about 20 seconds.

As far as I can tell, it uses eccodes, so the naming convention doesn't follow wgrib2, which Herbie users are familiar with.