DocOtak / gsw-xarray

Wrapper for gsw that will add CF attributes to xarray.DataArray outputs
https://gsw-xarray.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
23 stars 3 forks source link

Emulate/wrap upstream gsw.* submodules #6

Closed DocOtak closed 2 years ago

DocOtak commented 2 years ago

I've attempted "automagically" wrap all the upstream submodules in gsw.

Right now the one failing test was still failing on me with a wrong standard name attribute assertion:


    def test_func_standard_module():
        """gsw can be used with modules, e.g. gsw.density"""
        ds = _create_ds()
        sigma0 = gsw.density.sigma0(SA=ds.SA, CT=ds.CT)
>       assert sigma0.attrs['standard_name'] == 'sea_water_sigma_t'
E       AssertionError: assert 'sea_water_absolute_salinity' == 'sea_water_sigma_t'
E         - sea_water_sigma_t
E         + sea_water_absolute_salinity

tests/test_gsw_xarray.py:29: AssertionError

But when I attempt to call the function manually, it has the correct attribute. So some more digging is needed before merging to understand this.

DocOtak commented 2 years ago

So the above error (and apparent heisenbug like nature of it), was caused by the different ways of accessing the submodules.

If you did a direct import such as:

from gsw_xarray.density import sigma0

The expected wrapped function would be returned to you. However, if you did the following:

import gsw_xarray
gsw_xarray.density.sigma0()

The gsw_xarray.density part is actually be an attribute lookup, which "fails" then falls back to the upstream gsw implementation. As far as I can tell, "density" needs to both exist as density.py for the import machinery, and as a name the __init__.py is aware of (so it does the right thing).

I could not figure out a way that reliably tests this with all the import caching going on.

rcaneill commented 2 years ago

I never reviewed a PR, I guess I should clone your branch locally (I need to find how to do this), compute the tests, have a look at the code, and verify that everything works fine?

rcaneill commented 2 years ago

I added some extra tests to test all possibilities of import, unfortunately using:

from gsw.density import sigma0
sigma0(...)

or

from gsw import density
density.sigma0

both fall on the gsw backend implementation. Maybe it has something to do with the caching of imports, or something more complicated.

Loading every function from gsw.density does not import anything actually...

from gsw.density import * # Not good to do, but checks if functions are loaded
sigma1

For the import caching you mentioned, if we implement the submodule tests in different test files, should reset the cache no? Or were you speaking about the '.pyc' caches?

I don't see any solution to be able to import in whatever way exists yet...

DocOtak commented 2 years ago

Good review! :grin:

Some things going on:

At the top of the tests, we have a:

import gsw_xarray as gsw

This allows you to directly access the functions like:

gsw.sigma0 # is is calling the renamed gsw_xarray from above

When you do the following:

gsw.density.sigma0

Since that gsw is a renamed gsw_xarray, the .density becomes an attribute lookup, the __getattr__ is called to try to find the name. Initially we only had the actual functions as valid name, so the lookup failed, and then upstream gsw library is searched for the name (which succeeds).

However, when you do:

from gsw_xarray.density import sigma0

The module loader machinery is doing that lookup, so the __getattr__ method is ignored and whatever loader (this means something specific in the importlib) that was returned is doing its thing (I don't know details).

This also means the following:

import gsw_xarray as gsw
gsw.density.sigma0 # will be our wrapped version

from gsw.density import sigma0 # uses the import machinery, will always find the upstream gsw module

Fixing the from gsw_xarray.density import * can be done by adding the correct __all__ value to each file.

This PR needs some more work...

rcaneill commented 2 years ago

I re-wrote the module tests in a new file. I also split them into different functions to test 1 (or 2) feature at a time. I had some time to understand all the machinery you did Andrew, and now I understand how these __dir__ and __getattr__ functions work more or less.

If I summarize, I think that gsw_xarray should allow the following imports, with a preference for the 1st two methods:

  1. import gsw_xarray as gsw and then use directly the functions, e.g. gsw.sigma0
  2. from gsw_xarray import sigma0
  3. from gsw_xarray.density import sigma0
  4. from gsw_xarray import density, and use density.sigma0
  5. [can be discussed] from gsw_xarray import *
  6. [can be discussed] from gsw_xarray.density import *

The 4 first methods are now working and have their own tests in test_imports.py. The 5th adn 6th methods would need to use __all__ = __dir__() I guess(?), or something similar.

Now I also understand that the tests I wrote previously were falling on the gsw backend because I were importing the functions from gsw directly, as you answered me...

I added some extra tests to test all possibilities of import, unfortunately using:

from gsw.density import sigma0
sigma0(...)

or

from gsw import density
density.sigma0

both fall on the gsw backend implementation. Maybe it has something to do with the caching of imports, or something more complicated.

Tell me if I missed something.

DocOtak commented 2 years ago

Was away for a few days, will check in on this. I have some ideas about testing the API surface against the upstream gsw that I'd like to try out before merging. That said, this feels like it is holding up other tasks.

Thanks for getting milestones started!

DocOtak commented 2 years ago

One more look would be awesome.

Some explanation of changes to the tests:

On the todo, is to make the wildcard imports work: e.g. from gsw_xarray.density import *, but at this point I'd want to handle it in a new PR.