Closed lucascolley closed 1 month ago
This is ready to merge from my side.
We should probably try to get feedback from other people who would be using this. My feeling is that people might prefer to call atleast_nd(x) rather than atleast_nd(x, xp).
I suppose one option would be to make xp optional, like
def atleast_nd(x, xp=None):
if xp is None:
from array_api_compat import array_namespace
xp = array_namespace(x)
That way you can get the best of both worlds.
We would probably want to move that logic into a common decorator. And also do some indirection for array_api_compat so that people can easily swap it out for a vendored version. (although OTOH the more indirection and decorators we use, the harder it is for people to just copy-paste a single function).
@betatim @ogrisel @thomasjpfan hello đź‘‹ this is the start of an attempt to share array-agnostic implementations of functions that build on top of the standard across consumer libraries. So instead of storing private implementations in each consumer library, they'll be stored here and one can do:
from ... import array_api_extra as xpx
...
xpx.atleast_nd(x, ndim=2, xp=xp)
I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp
as an argument? (EDIT: where xp
is a standard-compatible namespace for x
)
We could make xp
optional and call array_namespace
on the input arrays when it isn't passed, which would make using these functions ever so slightly more flexible and keep the APIs "clean" in a sense. But from my perspective in SciPy, we will always be using these functions after xp = array_namespace(x)
, so xp
will always be available to us to pass. Furthermore, passing xp
around will avoid redundant array_namespace
calls.
If we force developers to pass xp
into these functions, we can remove any dependency on array-api-compat here which will make it easier to enable simultaneous vendoring of the two libraries. I think that sounds like a win with no serious downside - what do you think?
I should point out that not using array-api-compat will make writing this library much harder as you would have to avoid using APIs that differ between the array libraries (or else effectively re-implement array-api-compat in helpers). I personally don't think it's a worthwhile goal.
you would have to avoid using APIs that differ between the array libraries
Which APIs are you thinking of? I forgot that we're probably at least going to need array-api-compat's size
helper. Maybe we can get away with just copying some functions from array-api-compat?
To be clear, xp
should be a namespace that comes from array-api-compat.
Literally any API you might use. For example, in this function you're using expand_dims
, but expand_dims
does not exist in torch
, only in array_api_compat.torch
.
To be clear, xp should be a namespace that comes from array-api-compat.
I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.
I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.
To clarify, the user-base for this library is consumer-library authors who are already using array-api-compat namespaces everywhere. In other words, we are just targeting the spec, not existing libraries.
I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp as an argument?
I prefer always passing it in as an argument.
hello @kelsolaar too! You had previously expressed interest in functions like atleast_nd
from scipy._lib._array_api
being available in a public namespace. Do you have an opinion on the question in https://github.com/data-apis/array-api-extra/pull/3#issuecomment-2364346148 above?
Here's a PR showing that vendoring this package and using xpx.atleast_nd
works in SciPy: https://github.com/scipy/scipy/pull/21600
pixi run docs
:
@vnmabus perhaps you have an opinion on the question in https://github.com/data-apis/array-api-extra/pull/3#issuecomment-2364346148 too?
@vnmabus perhaps you have an opinion on the question in #3 (comment) too?
I am not sure of why you request my particular feedback. Personally, I think that it is probably better to prevent writing code that do unnecessary calls to array_namespace
, but having to pass xp
to each function seems also very redundant, and it is not the same style used by the array API functions.
I wonder if it wouldn't be cleaner to have a function that receives the xp
namespace and then returns an "extended" namespace with the extra functions, and with xp
already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).
I am not sure of why you request my particular feedback.
Mainly because I don't know many other array-api-compat users!
I wonder if it wouldn't be cleaner to have a function that receives the xp namespace and then returns an "extended" namespace with the extra functions, and with xp already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).
Cool idea! I guess something like the following would do it?
extra_funcs = {'atleast_nd': atleast_nd}
def get_extra_namespace(xp: ModuleType) -> ModuleType:
class ExtraNamespace:
def __getattr__(self, name: str):
if name in extra_funcs:
func = functools.partial(extra_funcs[name], xp=xp)
return func
else:
return getattr(xp, name)
# or `return AttributeError` ?
return ExtraNamespace(xp)
I'm not sure whether we'd want to use that yet in SciPy, since in most functions we would only be calling an xpx
function once, so the extra line to acquire the extra namespace isn't really worth it. But I do think the following pattern looks quite clean:
xp = array_namespace(x)
xpx = get_extra_namespace(xp)
...
x = xpx.atleast_nd(x, ndim=5)
y = xp.sum(x)
It would probably be better to keep uses of the two patterns separate, to avoid xpx
sometimes needing an xp
kwarg and sometimes not.
Maybe it would be better for get_extra_namespace
to return a namespace with just the extra functions, rather than an extended namespace, though? I think it would be nice to keep calls to standard functions as calls to xp
rather than xpx
.
Thanks for the thoughts!
In scikit-learn we use the foo(..., xp=None)
pattern. That way it is optional, but you can pass it in.
I don't have a strong opinion either way. It feels weird to pass it in, but it saves some repetitive code, but then making it optional you need repetitive code again (or decorator). I am not massively convinced regarding the performance benefit of not repeatedly calling array_namespace
. So overall, it is a definitive maybe from me :D
I opened gh-6 to discuss potential alternatives to the xp=xp
interface to explore in the future. Right now, I'm happy to keep it as the pattern is (optionally) used across SciPy and scikit-learn already.
I'm going to merge this as the function is passing NumPy-quality tests, works in SciPy, and is well documented. Thanks all for the thoughts!
OK, that's fine. I think we can easily change things around in the future without too many backwards compatibility concerns.
Did you want to make a release on PyPI? If we set up something like https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/publish-package.yml and set up things on PyPI.
Sure, if you could do whatever needs Owner permissions I can do the rest - I would like to learn the process.
If you want you can try to set it up yourself, and add me as an owner on PyPI. I'll give you maintain role on this repo, which should hopefully be enough to add the right info to the repo settings.
I would start with the publish file from array-api-compat, as well as the instructions at https://github.com/data-apis/array-api-compat/pull/51 (I don't know if there are better docs for this somewhere). The best thing to do is to get everything working on TestPyPI first, and then once you have that working, enable publishing to real PyPI. I use a workflow where I push a tag up to the repo, but if you'd prefer to do a release via the web interface I think you might need to change things a little bit. Just remember that once you push a release to PyPI, you cannot undo it. If there's a problem, the only way to fix it is to bump the version number.
I would also recommend adding https://github.com/data-apis/array-api-compat/blob/main/.github/dependabot.yml and https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/dependabot-auto-merge.yml to this repo. That will keep the GitHub Actions versions up-to-date automatically. Actually the newest version of publish-package is supposed to have something that makes setting up PyPI easier https://github.com/pypa/gh-action-pypi-publish/releases
Thanks Aaron, sounds good to me.
I don't know if there are better docs for this somewhere
I suppose there is Henry’s https://learn.scientific-python.org/development/guides/gha-pure/#job-setup, but who knows which is “better” :)
The repo template (scientific-python/cookie) already included dependabot. I’ll add the auto-merge workflow.
@asmeurer I've requested access for the all contributors bot, would you be able to grant that? And could you also import the project on readthedocs?
@asmeurer I've requested access for the all contributors bot, would you be able to grant that?
I think this might be something we used to have but removed, so I'll need to check if that's the case and why it was removed if so.
And could you also import the project on readthedocs?
Let's use GitHub pages (or netlify if you prefer) so the docs appear under data-apis.org like the rest of the projects.
@asmeurer are you sure we want to allow not passing
xp
and callingarray_namespace
ourselves?If we didn't allow that, I think we could make this package have no dependency on array-api-compat, which would make it a lot easier to vendor. From SciPy's perspective, we will always have access to
xp
when using functions from this package, so it isn't necessary.