Open rcaneill opened 2 years ago
I have to confess that my preference would be to make them keyword-only arguments as in https://peps.python.org/pep-3102/ instead of changing the match_args_return
. What do you think @efiring?
Agreed, I think that making optional arguments keyword-only makes good sense, in terms of general API design. I haven't looked at how this would be applied here, but I hope it would turn out to be painless.
How about the other way around, where we call the positional arguments using keywords?
How this came up is we are attempting to add some pint.Quantity validation to the inputs over in the gsw-xarray project. The GSW-Python library is very consistent with argument variable names and the expected input units (yay!). We found it super convenient to just convert everything to a kwargs dict and do the unit lookups (based on key) and other transformations (based on value) we need to get things compatible with GSW-Python.
As written, the match_args_return decorator ignores everything in kwargs that isn't p
I opened an issue in gsw-xarray, but it is relevant to duplicate it here (https://github.com/DocOtak/gsw-xarray/issues/49)
In gsw, iterables are transformed into numpy arrays when used in args, but not when used in kwargs https://github.com/TEOS-10/GSW-Python/blob/d75dfe5454fd3050bb36ab92c08ba77fcf087f8f/gsw/_utilities.py#L18-L57
import gsw
# works
gsw.geo_strf_dyn_height([34, 34.1, 34.2], [0, 1, 2], [0, 10, 20])
# Does not work
gsw.geo_strf_dyn_height(SA=[34, 34.1, 34.2], CT=[0, 1, 2], p=[0, 10, 20])
Error raised
~/.cache/pypoetry/virtualenvs/gsw-xarray-NsrEXKiZ-py3.8/lib/python3.8/site-packages/gsw/geostrophy.py in geo_strf_dyn_height(SA, CT, p, p_ref, axis, max_dp, interp_method)
53 raise ValueError('interp_method must be one of %s'
54 % (interp_methods.keys(),))
---> 55 if SA.shape != CT.shape:
56 raise ValueError('Shapes of SA and CT must match; found %s and %s'
57 % (SA.shape, CT.shape))
AttributeError: 'list' object has no attribute 'shape'
How about the other way around, where we call the positional arguments using keywords?
I'm on the fence b/c S, T, p are pretty much "default" inputs and making they kw would force users some extra typing that seems counter intuitive.
How I understood Andrew's question was that the way it is done today does not allow the user to use keyword argument for T, S and p (i.e. no check is done to transform them into arrays)
Because the ufuncs are wrapped in python funcs, python allows you to call the functions with kwargs for the positional arguments (in whatever order you want), this behavior is as defined/outlines in PEP-3102. match_args_return
does not account for this particular way of calling the functions making the kwargs effectively pass though when called this way. I think there are basically two options: restrict the argument to positional only using the /
function signature syntax, or modify match_args_return to accommodate this usage.
What I think should happen is the following:
*
function signature syntax)I see the attraction of your proposal, but I'm initially inclined to prefer the simpler alternative (on the GSW-Python side) of using /
in the signature. For GSW-Python direct users, I don't see any major advantage in letting those positional arguments be keyword arguments. Rather than make match_args_return
more complicated, why not have gsw-xarray assume the responsibility of supplying the positional arguments as positional arguments?
@efiring can you describe the intent behind match_args_return
? That is, I'd like to understand what it is meant to be doing and not assume from the implementation.
It derives from my liking for masked arrays, and I think I have a couple of versions: the original takes masked or nan-style array inputs, normalizes to masked, passes them to the inner function, and then converts the returned array to match the inputs. So if an input was masked, then the return will be a masked array. The second, currently in GSW-Python, is the same except that it normalizes to nan-style as input to the inner function. In addition, it normalizes inputs to double precision, and returns a scalar if the inputs are all scalar. Hence the name: it tries to make the output match the inputs, while providing normalized inputs to whatever it is decorating.
Isn't almost all of the desired behavior done by the ufunc api already? Specifically:
__array_*
that ndarray subclasses implement (including masked arrays), even scalars work..astype
where
argument and will skip over those positions entirely when doing calculationsThe only thing I can see that is missing is setting a new mask when newly invalid data is the result. Passing in masked arrays with different mask shapes and values appears to result in a new mask that is the logical OR of the broadcasted masks.
I don't see how masked arrays can be handled properly, maintaining the greater speed of doing all the calculations with ordinary arrays, without the conversion I am doing, or additional logic to handle the where
kwarg. Perhaps a decorator based on that logic could be written that would be superior to what we have. I doubt it would be better for both ndarrays and masked arrays.
Also, match_args_return
is used for functions other than ufuncs; of course separate decorators could be used for ufuncs and ordinary functions.
Do you want to propose an alternative to match_args_return
and try it out?
Here is an initial crack at it, I'm not sure the best way I can share a jupyter notebook here so I've attached a zip with the ipynb and html output versions. Let me know if there is a better way.
For now, this is only for the gsw._warpped_ufunc.*
functions, I haven't looked at the more "python" funcs.
I took a look at how masked_arrays implement their own ufuncs, the two argument ufuncs are wrapped by a _MaskedBinaryOperation class, with the interesting bits being in the call method. Within a numpy error context, it calls the ufunc on the entire underlying _data array of the masked array then does some boolean ORs on the mask to figure out the new mask. For large arrays or expensive functions, this can be slow. It seems that the where argument is very speedy so I just tried inverting the mask to be compatible with where.
That looks very promising. My machine doesn't have enough memory to do the big data tests in a reasonable amount of time, but otherwise the results are similar.
Due to the transformations of arguments to arrays, if optional arguments are passed as arguments in geo_strf_dyn_height, an error is raised. I guess that a check should be implemented in https://github.com/TEOS-10/GSW-Python/blob/d75dfe5454fd3050bb36ab92c08ba77fcf087f8f/gsw/_utilities.py#L51-L57 before transforming into arrays.
exemple 1
raises
exemple 2
raises