Open TomNicholas opened 1 year ago
I would first point out that there is a little bit of consistency injected via classes that call functions, e.g., kerchunk.grib2.GribToZarr
is a class designed to feel similar to kerchunk.hdf.SingleHdf5ToZarr
.
A general file dispatch system seems reasonable, possibly something that belongs in Intake 2 (which already tries to guess file types by URL pattern matching or reading magic bytes). We probably don't want to replicate work in pangeo-forge, though?
Should there be some arguments that are valid for every backend (e.g. inline_threshold), and others that are specific to particular backends?
There are definitely operations that will be the same for all backends, like inlining.
On virtual zarrs, this sounds something between https://github.com/nsidc/earthaccess/pull/278 and a special xarray engine="scan-kerchunk". The trouble is, as with everything kerchunk, is that there are many options (such as what to do with gribs...) and it becomes hard to specify them all in a reasonable way. Not all of kerchunk will be xarray friendly (and maybe not even zarr).
Problem
The API for Kerchunk's file format backend openers doesn't follow a consistent pattern.
Suggestion
Change the openers to each be a function returning a
VirtualZarrStore
(see #375), with standardized keyword arguments.Advantages
Implementation ideas
inline_threshold
), and others that are specific to particular backends?Questions
How to handle GRIB files? Combine before returning? Return as a hierarchy of multiple groups within a single store (like when opening with datatree)? Or return as list of
VirtualZarrStores
?