fsspec / kerchunk

Cloud-friendly access to archival data
https://fsspec.github.io/kerchunk/
MIT License
312 stars 80 forks source link

test_extract_datatree_chunk_index fails with eccodes v2.38.0 #508

Open maresb opened 2 months ago

maresb commented 2 months ago

As first noticed in https://github.com/fsspec/kerchunk/pull/506, a test is failing when eccodes v2.38.0 is installed.

Observed in CI:

>       assert (
            idx_df["typeOfLevel"][[42, 46, 49, 50]]
            == ["heightAboveGround", "isobaricInhPa", "surface", "heightAboveGround"]
        ).all()
E       AssertionError: assert False
E        +  where False = <bound method NDFrame._add_numeric_operations.<locals>.all of 42     True\n46     True\n49     True\n50    False\nName: typeOfLevel, dtype: bool>()
E        +    where <bound method NDFrame._add_numeric_operations.<locals>.all of 42     True\n46     True\n49     True\n50    False\nName: typeOfLevel, dtype: bool> = 42    heightA... dtype: object == ['heightAbove...tAboveGround']
E             Full diff:
E             - [
E             -  'heightAboveGround',
E             ? ^^                 --
E             + 42    heightAboveGround
E             ? ^^^^^^
E             -  'isobaricInhPa',...
E
E             ...Full output truncated (8 lines hidden), use '-vv' to show.all

Troubleshooting by @maresb:

idx_df.iloc[83]
varname                                                             wz
typeOfLevel                                              isobaricInhPa
stepType                                                       instant
name                                       Geometric vertical velocity
isobaricInhPa                                                    975.0
step                                                   0 days 06:00:00
time                                               2023-09-28 00:00:00
valid_time                                         2023-09-28 06:00:00
uri                            ~/repos/kerchunk/kerchunk/tests/gfs....
offset                                                        21234675
length                                                         1035696
inline_value                                                      None
surface                                                            NaN
heightAboveGround                                                  NaN
meanSea                                                            NaN
Name: 83, dtype: object

From @Anu-Ra-g with eccodes v2.36.0:

varname                                                           watr
typeOfLevel                                                    surface
stepType                                                         accum
name                                                      Water runoff
isobaricInhPa                                                      NaN
step                                                   0 days 06:00:00
time                                               2023-09-28 00:00:00
valid_time                                         2023-09-28 06:00:00
uri                  ./kerchunk/tests/gfs.t00z.pgrb2.0p25.f006.test...
offset                                                        59215098
length                                                          348767
inline_value                                                      None
surface                                                            0.0
heightAboveGround                                                  NaN
meanSea                                                            NaN
Name: 83, dtype: object

Even though the error is occuring, the remaining steps work fine. image

martindurant commented 2 months ago

@Anu-Ra-g , you already started to look into this, some details are in the linked thread.

Anu-Ra-g commented 2 months ago

Here is the test diff, image

maresb commented 2 months ago

That's consistent with what I was seeing.

Anu-Ra-g commented 2 months ago

extract_datatree_chunk_index filters out the Series data which is empty. So what I observed after passing the same grib file through the function using eccodes v2.36.0 and v2.38.0 is that the former had 87 rows and the latter had 85 rows. Below I've compared the two dataframes from the eccodes versions. The first columns is produced using 2.36.0 and second using 2.38.0. I guess, eccodes v2.38.0 is not able to read couple of variables in the grib file.

Note: I've used index 85 and 86 to compare the dataframes


varname
self    other
12  cprat   cpr
13  cprat   cpr
68  tmax    tp
69  tmin    u
70  tp  u
72  u   u10
73  u   v
74  u10 v
77  v   v10
78  v   w
79  v10 w
81  w   watr
82  w   wz
83  watr    wz
85  wz  ignore_values
86  wz  ignore_values
martindurant commented 2 months ago

I don't know what eccodes might be doing, but for the sake of the test, I would compare absolute offsets for the messages we know should be there, rather by index value.

One alternative would be to add an empty row to the output when reading failed.

martindurant commented 2 months ago

(or we could pin the version of eccodes, of course, but that's unfortunate)

maresb commented 2 months ago

I'm not so familiar with the intricacies of this GRIB, and I don't know if eccodes is using semver, but it seems to me like there's a breaking change in v2.38.0, perhaps a bug, especially since it's failing to read some of the variables.

Unless we're doing something sketchy in this test so that the variables aren't well-defined, perhaps we should temporarily pin eccodes<2.38 and report this as an issue to them?

martindurant commented 2 months ago

I agree, if we can show a reproducer locally with different outputs for the new version of eccodes compared to the previous one, that would be a good issue to post.

Anu-Ra-g commented 2 months ago

Reproduce using the extract_datatree_chunk_index function?

martindurant commented 2 months ago

No, with eccodes/cfgrib directly, not touching any of our code at all.

maresb commented 2 months ago

I looked into pinning eccodes, and the situation is pretty messy. It's not a direct dependency of kerchunk, but rather transitive through cfgrib, and that itself is only an optional dependency in the kerchunk[grib] group.

There's an easy conda solution to this problem: we just need to add a run_constrained with eccodes <2.38 which does exactly what we want: "if eccodes is installed then constrain it to be <2.38." But as far as I know there's no such mechanism for pip. So the best we could do is add eccodes <2.38 under kerchunk[grib].

We could also add a warning when the user runs a grib function with v2.38.

martindurant commented 2 months ago

But as far as I know there's no such mechanism for pip.

Correct, pip runs through the list of things to install in order. So if you include "eccodes<2.38" in your command after kerchunk (or as a separate command, or later in a pipenv file) all will be well.

We could also add a warning when the user runs a grib function with v2.38.

Let's make that issue and see if there's any immediate follow-up from them. Honestly, eccodes, changes slightly every release, and things break all the time. I can only assume that typical use via xarray/cfgrib doesn't touch those sharp edges.