Open maresb opened 2 months ago
@Anu-Ra-g , you already started to look into this, some details are in the linked thread.
Here is the test diff,
That's consistent with what I was seeing.
extract_datatree_chunk_index
filters out the Series
data which is empty. So what I observed after passing the same grib file through the function using eccodes v2.36.0
and v2.38.0
is that the former had 87 rows and the latter had 85 rows.
Below I've compared the two dataframes from the eccodes versions.
The first columns is produced using 2.36.0
and second using 2.38.0
. I guess, eccodes v2.38.0
is not able to read couple of variables in the grib file.
Note: I've used index 85 and 86 to compare the dataframes
varname self other 12 cprat cpr 13 cprat cpr 68 tmax tp 69 tmin u 70 tp u 72 u u10 73 u v 74 u10 v 77 v v10 78 v w 79 v10 w 81 w watr 82 w wz 83 watr wz 85 wz ignore_values 86 wz ignore_values
I don't know what eccodes might be doing, but for the sake of the test, I would compare absolute offsets for the messages we know should be there, rather by index value.
One alternative would be to add an empty row to the output when reading failed.
(or we could pin the version of eccodes, of course, but that's unfortunate)
I'm not so familiar with the intricacies of this GRIB, and I don't know if eccodes is using semver, but it seems to me like there's a breaking change in v2.38.0, perhaps a bug, especially since it's failing to read some of the variables.
Unless we're doing something sketchy in this test so that the variables aren't well-defined, perhaps we should temporarily pin eccodes<2.38
and report this as an issue to them?
I agree, if we can show a reproducer locally with different outputs for the new version of eccodes compared to the previous one, that would be a good issue to post.
Reproduce using the extract_datatree_chunk_index
function?
No, with eccodes/cfgrib directly, not touching any of our code at all.
I looked into pinning eccodes, and the situation is pretty messy. It's not a direct dependency of kerchunk, but rather transitive through cfgrib
, and that itself is only an optional dependency in the kerchunk[grib]
group.
There's an easy conda solution to this problem: we just need to add a run_constrained
with eccodes <2.38
which does exactly what we want: "if eccodes
is installed then constrain it to be <2.38
." But as far as I know there's no such mechanism for pip. So the best we could do is add eccodes <2.38
under kerchunk[grib]
.
We could also add a warning when the user runs a grib function with v2.38.
But as far as I know there's no such mechanism for pip.
Correct, pip runs through the list of things to install in order. So if you include "eccodes<2.38" in your command after kerchunk (or as a separate command, or later in a pipenv file) all will be well.
We could also add a warning when the user runs a grib function with v2.38.
Let's make that issue and see if there's any immediate follow-up from them. Honestly, eccodes, changes slightly every release, and things break all the time. I can only assume that typical use via xarray/cfgrib doesn't touch those sharp edges.
As first noticed in https://github.com/fsspec/kerchunk/pull/506, a test is failing when eccodes v2.38.0 is installed.
Observed in CI:
Troubleshooting by @maresb:
From @Anu-Ra-g with eccodes v2.36.0: