Open TomAugspurger opened 1 year ago
This is certainly something that kerchunk could do, with effectively our own codec to expand whatever representation into an array at read time. That would be simple for linear coordinates, but GRIB allows for many complex coordinate definitions. I suppose it's possible to extract the parameters of whatever the coordinate system is, but we probably don't want to implement the coordinate generation algorithms, but call the appropriate functions in eccodes itself, if we can.
This all connects to the possibility of analytical coordinates in xarray. Perhaps we shouldn't be making arrays even at read time but making xarray indexes.
There's a CF convention for that!
We could totally interpret those as a "functional xarray index" too.
There's a CF convention for that
(plus also the FITS WCS ways to define the same; you won't get these from geo-datasets, but I think they may be more general)
People on this thread might be interested in the intake-stac sprint https://github.com/intake/intake-stac/issues/159
Thanks @dcherian. IIUC, the coordinate subsampling you linked to is essentially the same as range(0, 10, 1)
? We just have two "tie points" (the first and last point) and then linearly interpolate between them?
Do you know if this decoding is implemented in cf-xarray
or xarray.conventions.decode_cf_variable
? I didn't see it at https://cf-xarray.readthedocs.io/en/latest/coding.html or in a glanace at decode_cf_variable
.
It has not been implemented.
We just have two "tie points" (the first and last point) and then linearly interpolate between them?
Yes I think so, that's why it clicked in my head. I don't know what you would do for all the other GRIB coordinate systems
We just have two "tie points"
This is also essentially the case in standard TIFF, but of course more complex geometries are possible in practice, and GRIB has many models.
I'm working with a GRIB2 file, and am interested in minimizing the size of the references file. Currently, the largest values in the references come from the base64-encoded coordinates that were inlined in the references:
This specific variable (and longitude,
step
, and perhapstime
) can be represented "symbolically" (maybe not the right name), with something like arange(90, -90.1, -0.4)
.My questions:
Somewhat annoyingly, there are floating point inaccuracies between what I get from
np.arange
and what's coming out of cfgrib. But hopefully those can be solved.