ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
407 stars 77 forks source link

Add new open APIs for generic GRIB Fieldsets #265

Closed alexamici closed 2 years ago

alexamici commented 3 years ago

Add entry points the generic implementations in #243 that do not assume the storage is a GRIB file.

Closes #243

codecov-commenter commented 3 years ago

Codecov Report

Merging #265 (2ccd958) into master (18f930e) will decrease coverage by 0.68%. The diff coverage is 92.60%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #265      +/-   ##
==========================================
- Coverage   96.43%   95.75%   -0.69%     
==========================================
  Files          26       26              
  Lines        1825     1932     +107     
  Branches      211      221      +10     
==========================================
+ Hits         1760     1850      +90     
- Misses         39       54      +15     
- Partials       26       28       +2     
Impacted Files Coverage Δ
cfgrib/cfmessage.py 95.45% <ø> (ø)
cfgrib/abc.py 81.48% <80.76%> (-18.52%) :arrow_down:
cfgrib/messages.py 89.85% <89.42%> (-3.13%) :arrow_down:
cfgrib/dataset.py 99.01% <96.96%> (+0.05%) :arrow_up:
cfgrib/__init__.py 100.00% <100.00%> (ø)
cfgrib/xarray_plugin.py 87.87% <100.00%> (+0.57%) :arrow_up:
cfgrib/xarray_store.py 93.84% <100.00%> (+0.29%) :arrow_up:
tests/test_20_messages.py 100.00% <100.00%> (+0.72%) :arrow_up:
tests/test_30_dataset.py 100.00% <100.00%> (ø)
tests/test_50_xarray_plugin.py 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 18f930e...2ccd958. Read the comment docs.

alexamici commented 3 years ago

@iainrussell the code for the PR should be ready, I need to add the documentation to the new entry points open_container and open_from_index and then you should be able to test it.

I'm not sure wether to add a section to the README since this is a developer API, not a user one.

alexamici commented 3 years ago

@iainrussell and @b8raoult after a huge cleanup and rename of the interface to Field and Fieldset, I added a short documentation in ADVANCED_USAGE.rst.

I think the PR is complete and ready to be tested.

alexamici commented 3 years ago

Final bit: we tested the performance of the GRIB to netCDF translation with the new code with no regression.

iainrussell commented 3 years ago

This looks great, @alexamici - I'm on leave this week and next, so will probably have a proper look at it in the week of 15th November. Cheers, Iain

iainrussell commented 3 years ago

Hi @alexamici,

I started testing this on Friday, and managed to create a new proxy class in Metview that could be passed to this in place of a Fieldset (with more time, I'm sure we can pass an actual Fieldset, I just didn't want to risk changing behaviour at the moment). And it worked, I got a nice dataset from it, which had (most of) the right dimensions and the plot looked reasonable :) So well done!

I noticed one thing that I think is not working properly: date&time. The time is always set to 1970. I noticed that the cfgrib function from_grib_date_time() is not being called at all. Perhaps this is a point of discussion - the current implementation on the Metview end will return all the normal GRIB keys, e.g. dataDate, dataTime, etc, so the actual time can be computed in the same way as it would be with a normal GRIB file. The alternative is that cfgrib asks for 'time' (as it seems to be), and we return a datetime object - I tried that, but it failed - if this is the preferred way to go, do you have an example of how to specify the date/time in the dict example?

I also had an issue, but it might have been a problem in my environment - most of the cfgrib tests did not initially work if a GRIB file had an index file, but the tests worked once I deleted the index files. It actually seems to be working now, so it may have been VSCode being funny. I got this message: "AttributeError: 'FileIndex' object has no attribute 'field_ids_index'". If that seems impossible, then ignore this, because I do not get that error today! But otherwise it might be good to check.

But this is a great achievement, a major refactoring of the code, allowing developers to 'impersonate' GRIB files with other classes!

Cheers, Iain

alexamici commented 2 years ago

@b8raoult & @iainrussell

I added around an internal use of typing.Protocol around https://github.com/ecmwf/cfgrib/pull/265/commits/62e3a786cb3e9dc420c9a8d7e192a7d67e8bf949 but that is not supported in Python 3.7 and there's no backward compatibility module.

Since this is an internal definition and only useful to clean up the type hints inside the cose, not at API level, I'll try to find another way to get types right, which will probably look a bit uglier.

Feature-wise the current implementation supports mapping and sequence fieldsets with lazy loading.

alexamici commented 2 years ago

@iainrussell & @b8raoult the PR is now ready for review and testing (and merge).

The new API is supports the sequence-like Fieldset and the MappingFieldset, both supports lazy-loading and I added basic documentation of the new API.

iainrussell commented 2 years ago

@alexamici that's great! I think then the only outstanding thing is the generation of computed keys, related to time in particular.

alexamici commented 2 years ago

@iainrussell I added the compute_index_keys to the public API and moved the computed keys machinery to use an adapter instead of a class in all read-only code paths.

That should close the remaining requests AND make the PR a real monster.

On the plus side all tests still work after I sync'ed to the new interfaces and I couldn't measure any performance regression.

It is OK to merge for me.