blaylockbk / Herbie

Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the University of Utah Pando Archive System.
https://herbie.readthedocs.io/
MIT License
424 stars 70 forks source link

Optional dependencies #319

Closed rafa-guedes closed 1 month ago

rafa-guedes commented 1 month ago

@blaylockbk this is an attempt to make those libraries used by the accessors extra dependencies. I created a little function to conditionally import modules or functions and throw a message indicating [extras] need to be installed if the import fails, to avoid repeating this over and over. The downside of this approach is that linting fails to recognise libraries imported that way so happy to change this is you don't think it is a good idea.

A few points to notice:

rafa-guedes commented 1 month ago

Closes @https://github.com/blaylockbk/Herbie/issues/318

rafa-guedes commented 1 month ago

Also worth mentioning there are tests failing but they are also failing for me when installing the version in main.

blaylockbk commented 1 month ago

Great work! It will be nice to handle optional features in a better way. This is a big-ish change to the codebase, so I want to get this right.

Small note added to the installation section in the docs. I also removed the note mentioning cfgrib needing to be manually installed since it is prescribed in pyproject.toml as a required dependency.

Let's keep this notice. Cfgrib requires eccodes, which is not installed when you do a pip install cfgrib, and I want to make sure users understand that.


The downside of this approach is that linting fails to recognise libraries imported that way so happy to change this is you don't think it is a good idea.

I see what you mean. Yeah, I would really like to figure out how to keep the linting to recognize the imports, mainly for myself because I pay attention to the linting errors.

I've been looking at how other projects deal with optional dependencies. I don't know if these fix the linting issue

rafa-guedes commented 1 month ago

Yeah, I would really like to figure out how to keep the linting to recognize the imports

Yep fair enough, it is not ideal indeed to have those misleading errors highlighted in the code. A few comments on the solutions you pointed above:

Polars devised a way to "lazy load" optional packages. This seems more complex than I want to do.

Agree this looks too complex for the problem we are trying to address here.

MetPy burries importing cartopy in a function with try/except

I believe this should work given the import is done explicitly inside that import_cartopy function. It may be a good approach if cartopy is imported in many different places but here we have a handful of libraries that are imported in only one or two places so we would have to define a few of these functions - it may be simpler to just try/except each import individually.

I have removed the dynamic importing function in https://github.com/blaylockbk/Herbie/pull/319/commits/9c43e09129d0d61d3af4be39e270461fa7419f23, the optional deps are being imported now inside individual try-except blocks - let me know if you think this would be a better approach.

blaylockbk commented 1 month ago

Yeah. I'm liking the simple try/except much better.

One comment, when I only install the required dependencies and then do from herbie import Herbie, then I get all the warnings printed. This could be annoying for people who only want to use Herbie's basic downloading feature. I think what we want is to only show the warnings about extra imports when the user tries using an accessor that requires a dependency not installed. I'm thinking that means pushing those imports into the accessor functions themselves, then instead of warning we can raise an error saying, "the X function requires Y package; install Herbie's extra dependencies for full functionality". Do you think this is possible and makes sense?

rafa-guedes commented 1 month ago

I'm thinking that means pushing those imports into the accessor functions themselves,

I had initially done it this way, the imports were inside the __init__ method in the accessor class - but for some reason BallTree was not being picked up in the tests causing some error in the automated tests in github (it was working on my local install though). I'll try to define it that way again, perhaps move the BallTree import inside the actual method where it is used - and raise an ImportError instead of the warning as you suggested.

rafa-guedes commented 1 month ago

but for some reason BallTree was not being picked up in the tests

I think it was something to do with the fact BallTree was being used inside a nested function

I have made the changes in fecf307a623b2ccb6a1e5806ce86878989761e18. I replaced all warnings by ModuleNotFoundErrors, and put the imports inside the classes / functions. The cartopy / shapely / metpy are inside init.py (given they are used in more than one method and seem to be essential for the accessor to work) meaning the class will fail to instantiate if those libraries aren't available. The BallTree one is inside the plant_tree nested function so the accessor will only fail when executing the pick_points method.

rafa-guedes commented 1 month ago

The latest tests failed because Polygon could not be found in the accessor. I'm not sure why, I assumed it would be enough to have the imports defined in the init() method. Perhaps this is due to the cached_property behaviour, or due to something specific to the accessor logics. I have fixed this by importing each optional library inside the methods that require them - but I left the try-except imports in init.

blaylockbk commented 1 month ago

In 4117fae I moved the optional dependency imports into the accessor methods that use them. Do you see any potential issues moving them there?

I tested a few examples, and I think this is the behavior we want. I even doubled checked import time to make sure that using an accessor a second time doesn't actually have to re-import the module completely.

When the tests failed, I simply reran them and got the green checkmark (for reasons I don't understand, maybe some caching that needed to be reset??)

I'm ready to merge this if you agree. I'll make a new release sometime in June when I have time and we can get feedback from people if this implementation caused any problems.

rafa-guedes commented 1 month ago

Happy with those changes and with merging!