TEOS-10 / GSW-Python

Python implementation of TEOS-10 GSW based on ufunc wrappers of GSW-C
https://teos-10.github.io/GSW-Python
Other
147 stars 31 forks source link

Error when using arg (instead of kwargs) for optional args in geo_strf_dyn_height #97

Open rcaneill opened 2 years ago

rcaneill commented 2 years ago

Due to the transformations of arguments to arrays, if optional arguments are passed as arguments in geo_strf_dyn_height, an error is raised. I guess that a check should be implemented in https://github.com/TEOS-10/GSW-Python/blob/d75dfe5454fd3050bb36ab92c08ba77fcf087f8f/gsw/_utilities.py#L51-L57 before transforming into arrays.

exemple 1

gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, 1.0, 'pchip')

raises

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-9be1c2b56a01> in <module>
----> 1 gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, 1.0, 'pchip')

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/_utilities.py in wrapper(*args, **kw)
     55                 newargs.append(arg)
     56             else:
---> 57                 newargs.append(np.asarray(arg, dtype=float))
     58 
     59         if p is not None:

ValueError: could not convert string to float: 'pchip'

exemple 2

gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, )

raises

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-ced139a8dd2a> in <module>
----> 1 gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20], 0, 0, )

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/_utilities.py in wrapper(*args, **kw)
     60             kw['p'] = newargs.pop()
     61 
---> 62         ret = f(*newargs, **kw)
     63 
     64         if isinstance(ret, tuple):

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/geostrophy.py in geo_strf_dyn_height(SA, CT, p, p_ref, axis, max_dp, interp_method)
     67     with np.errstate(invalid='ignore'):
     68         # The need for this context seems to be a bug in np.ma.any.
---> 69         if np.ma.any(np.ma.diff(np.ma.masked_invalid(p), axis=axis) <= 0):
     70             raise ValueError('p must be increasing along the specified axis')
     71     p = np.broadcast_to(p, SA.shape)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/ma/core.py in __call__(self, *args, **params)
   8200             _extras[p] = params.pop(p)
   8201         # Get the result
-> 8202         result = self._func.__call__(*args, **params).view(MaskedArray)
   8203         if "fill_value" in common_params:
   8204             result.fill_value = _extras.get("fill_value", None)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/core/overrides.py in diff(*args, **kwargs)

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/numpy/lib/function_base.py in diff(a, n, axis, prepend, append)
   1412     if nd == 0:
   1413         raise ValueError("diff requires input that is at least one dimensional")
-> 1414     axis = normalize_axis_index(axis, nd)
   1415 
   1416     combined = []

TypeError: only integer scalar arrays can be converted to a scalar index
ocefpaf commented 2 years ago

I have to confess that my preference would be to make them keyword-only arguments as in https://peps.python.org/pep-3102/ instead of changing the match_args_return. What do you think @efiring?

efiring commented 2 years ago

Agreed, I think that making optional arguments keyword-only makes good sense, in terms of general API design. I haven't looked at how this would be applied here, but I hope it would turn out to be painless.

DocOtak commented 2 years ago

How about the other way around, where we call the positional arguments using keywords?

How this came up is we are attempting to add some pint.Quantity validation to the inputs over in the gsw-xarray project. The GSW-Python library is very consistent with argument variable names and the expected input units (yay!). We found it super convenient to just convert everything to a kwargs dict and do the unit lookups (based on key) and other transformations (based on value) we need to get things compatible with GSW-Python.

As written, the match_args_return decorator ignores everything in kwargs that isn't p

rcaneill commented 2 years ago

I opened an issue in gsw-xarray, but it is relevant to duplicate it here (https://github.com/DocOtak/gsw-xarray/issues/49)

In gsw, iterables are transformed into numpy arrays when used in args, but not when used in kwargs https://github.com/TEOS-10/GSW-Python/blob/d75dfe5454fd3050bb36ab92c08ba77fcf087f8f/gsw/_utilities.py#L18-L57

Minimal example

import gsw

# works
gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20])
# Does not work
gsw.geo_strf_dyn_height(SA=[34, 34.1, 34.2], CT=[0, 1, 2], p=[0, 10, 20])

Error raised

~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/geostrophy.py in geo_strf_dyn_height(SA, CT, p, p_ref, axis, max_dp, interp_method)
     53         raise ValueError('interp_method must be one of %s'
     54                          % (interp_methods.keys(),))
---> 55     if SA.shape != CT.shape:
     56         raise ValueError('Shapes of SA and CT must match; found %s and %s'
     57                          % (SA.shape, CT.shape))

AttributeError: 'list' object has no attribute 'shape'
ocefpaf commented 2 years ago

How about the other way around, where we call the positional arguments using keywords?

I'm on the fence b/c S, T, p are pretty much "default" inputs and making they kw would force users some extra typing that seems counter intuitive.

rcaneill commented 2 years ago

How I understood Andrew's question was that the way it is done today does not allow the user to use keyword argument for T, S and p (i.e. no check is done to transform them into arrays)

DocOtak commented 2 years ago

Because the ufuncs are wrapped in python funcs, python allows you to call the functions with kwargs for the positional arguments (in whatever order you want), this behavior is as defined/outlines in PEP-3102. match_args_return does not account for this particular way of calling the functions making the kwargs effectively pass though when called this way. I think there are basically two options: restrict the argument to positional only using the / function signature syntax, or modify match_args_return to accommodate this usage.

What I think should happen is the following:

efiring commented 2 years ago

I see the attraction of your proposal, but I'm initially inclined to prefer the simpler alternative (on the GSW-Python side) of using / in the signature. For GSW-Python direct users, I don't see any major advantage in letting those positional arguments be keyword arguments. Rather than make match_args_return more complicated, why not have gsw-xarray assume the responsibility of supplying the positional arguments as positional arguments?

DocOtak commented 2 years ago

@efiring can you describe the intent behind match_args_return? That is, I'd like to understand what it is meant to be doing and not assume from the implementation.

efiring commented 2 years ago

It derives from my liking for masked arrays, and I think I have a couple of versions: the original takes masked or nan-style array inputs, normalizes to masked, passes them to the inner function, and then converts the returned array to match the inputs. So if an input was masked, then the return will be a masked array. The second, currently in GSW-Python, is the same except that it normalizes to nan-style as input to the inner function. In addition, it normalizes inputs to double precision, and returns a scalar if the inputs are all scalar. Hence the name: it tries to make the output match the inputs, while providing normalized inputs to whatever it is decorating.

DocOtak commented 2 years ago

Isn't almost all of the desired behavior done by the ufunc api already? Specifically:

The only thing I can see that is missing is setting a new mask when newly invalid data is the result. Passing in masked arrays with different mask shapes and values appears to result in a new mask that is the logical OR of the broadcasted masks.

efiring commented 2 years ago

I don't see how masked arrays can be handled properly, maintaining the greater speed of doing all the calculations with ordinary arrays, without the conversion I am doing, or additional logic to handle the where kwarg. Perhaps a decorator based on that logic could be written that would be superior to what we have. I doubt it would be better for both ndarrays and masked arrays. Also, match_args_return is used for functions other than ufuncs; of course separate decorators could be used for ufuncs and ordinary functions.

Do you want to propose an alternative to match_args_return and try it out?

DocOtak commented 2 years ago

Here is an initial crack at it, I'm not sure the best way I can share a jupyter notebook here so I've attached a zip with the ipynb and html output versions. Let me know if there is a better way.

For now, this is only for the gsw._warpped_ufunc.* functions, I haven't looked at the more "python" funcs.

I took a look at how masked_arrays implement their own ufuncs, the two argument ufuncs are wrapped by a _MaskedBinaryOperation class, with the interesting bits being in the call method. Within a numpy error context, it calls the ufunc on the entire underlying _data array of the masked array then does some boolean ORs on the mask to figure out the new mask. For large arrays or expensive functions, this can be slow. It seems that the where argument is very speedy so I just tried inverting the mask to be compatible with where.

ufunc_proposal.zip

efiring commented 2 years ago

That looks very promising. My machine doesn't have enough memory to do the big data tests in a reasonable amount of time, but otherwise the results are similar.