fsspec / kerchunk

Cloud-friendly access to archival data
https://fsspec.github.io/kerchunk/
MIT License
305 stars 78 forks source link

Invalid try/except block in `scan_grib`? #446

Closed kmpaul closed 6 months ago

kmpaul commented 6 months ago

In the kerchunk.grib2.scan_grib() function, there is a loop when processing the coordinate variables for a message, and there is a try/except block in this loop:

https://github.com/fsspec/kerchunk/blob/a0c4f3b828d37f6d07995925b324595af68c4a19/kerchunk/grib2.py#L258-L267

that appears to be intended to skip coordinate variables that produce errors when retrieving the coordinate information with cfgrib (the m object is a cfgrib.cfmessage.CfMessage object). However, exceptions produced in this try block never get caught because cfgrib only ever raises ValueError or AssertionError exceptions (at least with the recent versions of cfgrib).

Should this try block be modified to catch ValueError exceptions instead?

martindurant commented 6 months ago

@emfdavid ?

emfdavid commented 6 months ago

We added this in https://github.com/fsspec/kerchunk/pull/364 I see these errors when reading HRRR SUB Hourly grib2 files. Are you seeing other CFGRIB errors that are somehow recoverable/fixable in Kerchunk? This is somewhat unqiue in that we can repair it with care by taking the difference between the runtime and the valid time to get the step.

kmpaul commented 6 months ago

(Sorry for the delay. I am currently in GTC+2.)

@emfdavid: Thanks for the reply. Let me understand you completely. When you say, "I see these errors when reading HRRR SUB Hourly grib2 files," are you saying you see ValueErrors or eccodes.WrongStepUnitErrors?

I am also running with subhourly GRIB2 files, and I see ValueErrors arising from cfgrib due to the fact that the step attribute of the CfMessage has a str data type, which you expect with subhourly data. In the m.get("step") call, this produces a ValueError when cfgrib tries to compute int(message[step_key]) in cfgrib.cfmessage.from_grib_step(). Full traceback below:

Traceback ``` In [15]: m.get("step") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[15], line 1 ----> 1 m.get("step") File ~/Software/miniforge3/envs/myenv/lib/python3.10/_collections_abc.py:824, in Mapping.get(self, key, default) 822 'D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.' 823 try: --> 824 return self[key] 825 except KeyError: 826 return default File ~/Software/miniforge3/envs/myenv/lib/python3.10/site-packages/cfgrib/messages.py:213, in ComputedKeysMessage.__getitem__(self, item) 211 if item in self.computed_keys: 212 getter, _ = self.computed_keys[item] --> 213 return getter(self) 214 else: 215 return super(ComputedKeysMessage, self).__getitem__(item) File ~/Software/miniforge3/envs/myenv/lib/python3.10/site-packages/cfgrib/cfmessage.py:97, in from_grib_step(message, step_key, step_unit_key) 95 raise ValueError("unsupported stepUnit %r" % step_unit) 96 assert isinstance(to_seconds, int) # mypy misses this ---> 97 return int(message[step_key]) * to_seconds / 3600.0 ```

(Note, this is with Python 3.10, cfgrib 0.9.11.0, and eccodes 1.7.0.)

I never see eccodes.WrongStepUnitErrors.

(Incidentally, I know that there is a PR currently in to cfgrib to fix this: https://github.com/ecmwf/cfgrib/pull/371, but I don't know when it will be merged and released. Note that this is not a bug in eccodes, but a feature of the newest version of eccodes that is not fully accepted in cfgrib yet.)

kmpaul commented 6 months ago

Are you seeing other CFGRIB errors that are somehow recoverable/fixable in Kerchunk?

I only ever see the ValueErrors, but they can be avoided in 2 ways:

  1. hack cfgrib.cfmessage.from_grib_step() to change the default values of the step_key parameter from "endStep" to "endStep:int" (and for completeness, do the same for the step_unit_key parameter and make the same changes to cfgrib.cfmessage.to_grib_step()).

  2. modify scan_grib to deal with the special edge case of coord2 == "step" and change it to coord2 = "step:int".

The first option pushes the changes to cfgrib (which is coming in ecmwf/cfgrib#371, but who knows when), and the second option hacks kerchuck to deal with the issue temporarily while the upstream issues are resolved.

If you like, I can put together a PR that essentially deals with this edge case, something like:

 coord2 = {"latitude": "latitudes", "longitude": "longitudes", "step": "step:int"}.get(coord, coord)
 try: 
     x = m.get(coord2) 
 except eccodes.WrongStepUnitError as e: 
     logger.warning( 
         "Ignoring coordinate '%s' for varname '%s', raises: eccodes.WrongStepUnitError(%s)", 
         coord2, 
         varName, 
         e, 
     ) 
     continue 
emfdavid commented 6 months ago

That is cool - I poked around for a while and gave up. This looks much better.

If I understand correctly, we can put your patch in place in kerchunk now, and it should be forward compatible with cfgrib after the problem is corrected there. Then we can leave your patch in kerchunk in place until most users are up to date with cfgrib and eccodes updates.

martindurant commented 6 months ago

Sounds like a plan!

kmpaul commented 6 months ago

I just create PR #450.

kmpaul commented 6 months ago

See comments in #450 for updates.

TL;DR: This bug only comes up because of changes in the most recent versions of eccodes (2.34+).