data-apis / array-api-extra

Extra array functions built on top of the array API standard.
http://data-apis.org/array-api-extra/
MIT License
3 stars 1 forks source link

ENH: add `atleast_nd` #3

Closed lucascolley closed 1 month ago

lucascolley commented 1 month ago

@asmeurer are you sure we want to allow not passing xp and calling array_namespace ourselves?

If we didn't allow that, I think we could make this package have no dependency on array-api-compat, which would make it a lot easier to vendor. From SciPy's perspective, we will always have access to xp when using functions from this package, so it isn't necessary.

lucascolley commented 1 month ago

This is ready to merge from my side.

asmeurer commented 1 month ago

We should probably try to get feedback from other people who would be using this. My feeling is that people might prefer to call atleast_nd(x) rather than atleast_nd(x, xp).

I suppose one option would be to make xp optional, like

def atleast_nd(x, xp=None):
    if xp is None:
        from array_api_compat import array_namespace
        xp = array_namespace(x)

That way you can get the best of both worlds.

We would probably want to move that logic into a common decorator. And also do some indirection for array_api_compat so that people can easily swap it out for a vendored version. (although OTOH the more indirection and decorators we use, the harder it is for people to just copy-paste a single function).

lucascolley commented 1 month ago

@betatim @ogrisel @thomasjpfan hello đź‘‹ this is the start of an attempt to share array-agnostic implementations of functions that build on top of the standard across consumer libraries. So instead of storing private implementations in each consumer library, they'll be stored here and one can do:

from ... import array_api_extra as xpx
...
xpx.atleast_nd(x, ndim=2, xp=xp)

I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp as an argument? (EDIT: where xp is a standard-compatible namespace for x)

We could make xp optional and call array_namespace on the input arrays when it isn't passed, which would make using these functions ever so slightly more flexible and keep the APIs "clean" in a sense. But from my perspective in SciPy, we will always be using these functions after xp = array_namespace(x), so xp will always be available to us to pass. Furthermore, passing xp around will avoid redundant array_namespace calls.

If we force developers to pass xp into these functions, we can remove any dependency on array-api-compat here which will make it easier to enable simultaneous vendoring of the two libraries. I think that sounds like a win with no serious downside - what do you think?

asmeurer commented 1 month ago

I should point out that not using array-api-compat will make writing this library much harder as you would have to avoid using APIs that differ between the array libraries (or else effectively re-implement array-api-compat in helpers). I personally don't think it's a worthwhile goal.

lucascolley commented 1 month ago

you would have to avoid using APIs that differ between the array libraries

Which APIs are you thinking of? I forgot that we're probably at least going to need array-api-compat's size helper. Maybe we can get away with just copying some functions from array-api-compat?

To be clear, xp should be a namespace that comes from array-api-compat.

asmeurer commented 1 month ago

Literally any API you might use. For example, in this function you're using expand_dims, but expand_dims does not exist in torch, only in array_api_compat.torch.

To be clear, xp should be a namespace that comes from array-api-compat.

I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.

lucascolley commented 1 month ago

I suppose that does sidestep the issue. I still think it would be annoying as a user to have to pass in xp to every function. I'm a little surprised you're doing that in scipy, but I can't imagine every library wants to do that.

To clarify, the user-base for this library is consumer-library authors who are already using array-api-compat namespaces everywhere. In other words, we are just targeting the spec, not existing libraries.

thomasjpfan commented 1 month ago

I'd appreciate if any of you could weigh in from a scikit-learn perspective on the question of whether these functions should require xp as an argument?

I prefer always passing it in as an argument.

lucascolley commented 1 month ago

hello @kelsolaar too! You had previously expressed interest in functions like atleast_nd from scipy._lib._array_api being available in a public namespace. Do you have an opinion on the question in https://github.com/data-apis/array-api-extra/pull/3#issuecomment-2364346148 above?

lucascolley commented 1 month ago

Here's a PR showing that vendoring this package and using xpx.atleast_nd works in SciPy: https://github.com/scipy/scipy/pull/21600

lucascolley commented 1 month ago

pixi run docs:

image
lucascolley commented 1 month ago

@vnmabus perhaps you have an opinion on the question in https://github.com/data-apis/array-api-extra/pull/3#issuecomment-2364346148 too?

vnmabus commented 1 month ago

@vnmabus perhaps you have an opinion on the question in #3 (comment) too?

I am not sure of why you request my particular feedback. Personally, I think that it is probably better to prevent writing code that do unnecessary calls to array_namespace, but having to pass xp to each function seems also very redundant, and it is not the same style used by the array API functions.

I wonder if it wouldn't be cleaner to have a function that receives the xp namespace and then returns an "extended" namespace with the extra functions, and with xp already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).

lucascolley commented 1 month ago

I am not sure of why you request my particular feedback.

Mainly because I don't know many other array-api-compat users!

I wonder if it wouldn't be cleaner to have a function that receives the xp namespace and then returns an "extended" namespace with the extra functions, and with xp already "bound". That way the usage would be more similar to the array API pattern (i.e. obtain a namespace for particular arrays and then just use it).

Cool idea! I guess something like the following would do it?

extra_funcs = {'atleast_nd': atleast_nd}

def get_extra_namespace(xp: ModuleType) -> ModuleType:
    class ExtraNamespace:
        def __getattr__(self, name: str):
            if name in extra_funcs:
                func = functools.partial(extra_funcs[name], xp=xp)
                return func
            else:
                return getattr(xp, name)
                # or `return AttributeError` ?

    return ExtraNamespace(xp)

I'm not sure whether we'd want to use that yet in SciPy, since in most functions we would only be calling an xpx function once, so the extra line to acquire the extra namespace isn't really worth it. But I do think the following pattern looks quite clean:

xp = array_namespace(x)
xpx = get_extra_namespace(xp)
...
x = xpx.atleast_nd(x, ndim=5)
y = xp.sum(x)

It would probably be better to keep uses of the two patterns separate, to avoid xpx sometimes needing an xp kwarg and sometimes not.

Maybe it would be better for get_extra_namespace to return a namespace with just the extra functions, rather than an extended namespace, though? I think it would be nice to keep calls to standard functions as calls to xp rather than xpx.

Thanks for the thoughts!

betatim commented 1 month ago

In scikit-learn we use the foo(..., xp=None) pattern. That way it is optional, but you can pass it in.

I don't have a strong opinion either way. It feels weird to pass it in, but it saves some repetitive code, but then making it optional you need repetitive code again (or decorator). I am not massively convinced regarding the performance benefit of not repeatedly calling array_namespace. So overall, it is a definitive maybe from me :D

lucascolley commented 1 month ago

I opened gh-6 to discuss potential alternatives to the xp=xp interface to explore in the future. Right now, I'm happy to keep it as the pattern is (optionally) used across SciPy and scikit-learn already.

I'm going to merge this as the function is passing NumPy-quality tests, works in SciPy, and is well documented. Thanks all for the thoughts!

asmeurer commented 1 month ago

OK, that's fine. I think we can easily change things around in the future without too many backwards compatibility concerns.

asmeurer commented 1 month ago

Did you want to make a release on PyPI? If we set up something like https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/publish-package.yml and set up things on PyPI.

lucascolley commented 1 month ago

Sure, if you could do whatever needs Owner permissions I can do the rest - I would like to learn the process.

asmeurer commented 1 month ago

If you want you can try to set it up yourself, and add me as an owner on PyPI. I'll give you maintain role on this repo, which should hopefully be enough to add the right info to the repo settings.

I would start with the publish file from array-api-compat, as well as the instructions at https://github.com/data-apis/array-api-compat/pull/51 (I don't know if there are better docs for this somewhere). The best thing to do is to get everything working on TestPyPI first, and then once you have that working, enable publishing to real PyPI. I use a workflow where I push a tag up to the repo, but if you'd prefer to do a release via the web interface I think you might need to change things a little bit. Just remember that once you push a release to PyPI, you cannot undo it. If there's a problem, the only way to fix it is to bump the version number.

I would also recommend adding https://github.com/data-apis/array-api-compat/blob/main/.github/dependabot.yml and https://github.com/data-apis/array-api-compat/blob/main/.github/workflows/dependabot-auto-merge.yml to this repo. That will keep the GitHub Actions versions up-to-date automatically. Actually the newest version of publish-package is supposed to have something that makes setting up PyPI easier https://github.com/pypa/gh-action-pypi-publish/releases

lucascolley commented 1 month ago

Thanks Aaron, sounds good to me.

I don't know if there are better docs for this somewhere

I suppose there is Henry’s https://learn.scientific-python.org/development/guides/gha-pure/#job-setup, but who knows which is “better” :)

The repo template (scientific-python/cookie) already included dependabot. I’ll add the auto-merge workflow.

lucascolley commented 1 month ago

@asmeurer I've requested access for the all contributors bot, would you be able to grant that? And could you also import the project on readthedocs?

asmeurer commented 1 month ago

@asmeurer I've requested access for the all contributors bot, would you be able to grant that?

I think this might be something we used to have but removed, so I'll need to check if that's the case and why it was removed if so.

And could you also import the project on readthedocs?

Let's use GitHub pages (or netlify if you prefer) so the docs appear under data-apis.org like the rest of the projects.