Add ruff to ci setup - Githubissues

adonath commented 9 months ago

As proposed in #73 I'm adding a CI check that runs ruff as a linter. It currently runs as an "allowed failure". I can adapt the configuration as needed and optionally add ruff as a pre-commit hook. But I think so far array-api-compat does not pre-commit hooks, so I opted for the CI.

rgommers commented 9 months ago

Thanks for working on this @adonath!

I did a quick browse of the results, and it looks like a few error codes probably should simply be ignored

E402 Module level import not at top of file: that can't really be avoided probably, when all dependencies / wrapped libraries are optional
F401numpy.absimported but unused that's par for the course in __init__.py files when you're curating a public API to expose.

With those two ignored, hopefully it'll be a useful setup to catch other issues that may otherwise slip in.

But I think so far army-api-compat does not pre-commit hooks, so I opted for the CI.

Yes, that sounds like the right choice to me. CI is nice, and a lot less invasive than pre-commit.

vnmabus commented 9 months ago

* `F401 `numpy.abs` imported but unused` that's par for the course in `__init__.py` files when you're curating a public API to expose.

I think the convention in that case is to do explicit re-exports:

from numpy import abs as abs

rgommers commented 9 months ago

Oh, good point - if that makes the checker happy, then sure seems good to fix up that way.

adonath commented 9 months ago

Thanks @rgommers and @vnmabus for the comments! @rgommers do you have any preference whether to add the fixes to this PR or an independent one?

rgommers commented 9 months ago

Probably this PR, since it's needed to get CI green here? It's still not that much code, so fine to review in one go.

asmeurer commented 9 months ago

I don't think I've ever seen the import abs as abs pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.

The correct way to be explicit about exports is to define __all__. I don't remember if there was a good reason I didn't define __all__ in __init__.py. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.

asmeurer commented 9 months ago

And yes, make whatever improvements you need to make in this same PR. The point of this is also to make sure ruff isn't warning about any false positives, and there's no way to tell that unless we see the changes.

asmeurer commented 9 months ago

The import * error codes also need to be ignored.

vnmabus commented 9 months ago

I don't think I've ever seen the import abs as abs pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.

The correct way to be explicit about exports is to define __all__. I don't remember if there was a good reason I didn't define __all__ in __init__.py. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.

Some references: https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-no-implicit-reexport https://github.com/astral-sh/ruff/issues/717

Also in the typing PEP, considering stub files in this case: https://peps.python.org/pep-0484/#stub-files

asmeurer commented 9 months ago

import x as x doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand, __all__ actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in _aliases.py).

This is actually a good example of an issue I have with a lot of linters, especially "code quality" linters. In a sane world, a linter would flag import x as x as a probable typo. But instead, they are encouraging it.

lucascolley commented 9 months ago

In a sane world, a linter would flag import x as x as a probable typo

We can do that here if you want, ruff has it. https://docs.astral.sh/ruff/rules/useless-import-alias/#useless-import-alias-plc0414

vnmabus commented 9 months ago

import x as x doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand, __all__ actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in _aliases.py).

The "problem" with __all__ is that it is a separate piece of information that we need to update. It adds repetition in a different part of the file and risks that they get desynchronized. Thus, some library authors prefer not to use it. The import foo as foo pattern does not have this problem and, although the first time you see it looks admittedly a bit weird, you quickly learn to recognize it.

That said, it __all__ or implicit re-exports (disabling the error code) work for you, then lets go with that.

asmeurer commented 9 months ago

Re-exports are the whole point of this package. Anyway, the duplication of __all__ isn't a big deal especially if we do have a linter to help us keep things consistent.

adonath commented 9 months ago

Thanks everyone for the comments. I ended up doing more work than initially anticipated, but it's all related. Let me quickly summarize the related changes:

For now I decided to use the __all__ pattern in the public name spaces, such as numpy/__init__.py , numpy/inalg.py etc.
I thus removed __all__ definitions in non-public namespaces such as _aliases.py and _linalg.py. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module.
I decided to introduce a _get_all_public_members function to get all public members of an imported module in a uniform way. It is either taken from __all__ if present or created from dir(module) with excludes.
I decided to move the definitions of the linalg aliases to _aliases.py as well. This has the advantage of just defining the public name space in linalg.py. No need for any del statements or similar. And it works the same as in __init__.py, which also import from .aliases.py
The imports and definition of __all__ in the public modules now follows the same scheme everywhere:
- * import of the main namespace. Such as form torch import * (there we can't avoid the * import)
- Get all public member of the module as string using _get_all_public_members
- Explicit imports from ..common._helpers.py
- Explicit imports from ._aliases.py
- Explicit imports from .linalg.py
- The __all__ is build after the same import groups, such that maintenance is a bit easier.

I don't really have a preference of __all__ vs the rename during import. I think any reasonable editor (at least vscode does) highlights inconsistencies between __all__ and the imports. Names missing in __all__ will be shown as unused import, while names declared in __all__ which are not defined in the module are also highlighted.

asmeurer commented 9 months ago

I thus removed all definitions in non-public namespaces such as _aliases.py and _linalg.py. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module.

For me it's a little easier to remember to update __all__ when it's in the same file. But I guess more ideally would be if we could somehow check this automatically with a linter. I don't know if that's possible.

Explicit imports from ..common._helpers.py

As I noted in a line comment, it would be great if a linter can help us keep them consistent.

I think any reasonable editor (at least vscode does) highlights inconsistencies between all and the imports.

These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.

lucascolley commented 9 months ago

Names missing in all will be shown as unused import, while names declared in all which are not defined in the module are also highlighted. ... If ruff doesn't do this, we should use a linter that does.

ruff does this, indeed the initial CI log caught some of these, as discussed at the start of this thread. (unless you were referring to something else?)

adonath commented 9 months ago

For me it's a little easier to remember to update all when it's in the same file.

Yes, but you have to change the import into the public namespace anyway. And if you forget to update __all__ in this file, ruff errors with an unused import. I hope this makes sense.

These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.

In vscode I use pylance, which shows if undefined names are listed in __all__. Ruff by default does not seem to do this, but I can check whether there is a config option.

adonath commented 9 months ago

Concerning the type annotations and checking I think there are still two open issues:

The Array and Device annotations are undefined in common/_helpers.py. Here it is not 100% clear what to do. Either defined a union of valid arrays, or re-define the type annotations in the get_xp() for the specific array library.
There is the np._CopyMode annotation in common/_aliases.py which requires numpy for type checking. But I guess this might be fine...

lucascolley commented 9 months ago

Ruff by default does not seem to do this, but I can check whether there is a config option.

https://docs.astral.sh/ruff/rules/undefined-export/

adonath commented 9 months ago

Thanks @lucascolley! I enabled the corresponding option in ruff. For now I only added it as a command line option in the CI, long term it might be better to have ruff config file.

lucascolley commented 9 months ago

Above it sounded like Aaron wanted PLC0414 too.

adonath commented 9 months ago

There is a test fail in the CI, but it seems unrelated to this PR https://github.com/data-apis/array-api-compat/actions/runs/7702062263/job/20989506932?pr=82#step:6:5798

asmeurer commented 9 months ago

So just to be clear, is ruff now checking for __all__ consistency across files?

I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to __all__ (it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.

I don't know if there's some way to tell a linter, "warn me if I have a name in this file that isn't exported". Most linters do that for __init__.py but I don't know if there's a way to force it to do that for other files as well (and even better if it can read __all__ from __init__.py but check the names from some other file). Maybe we can just throw together our own custom linter for this. It probably isn't hard, especially if we just import the package and compare dir(submodule) and __all__. I'll admit the use-case of this library is a little niche. Most packages aren't in the business of re-exporting a namespace.

asmeurer commented 9 months ago

You can ignore the test failures. Looks like the numpy 1.21 failure needs to be added to this list https://github.com/data-apis/array-api-compat/blob/916a84bafd2ae97c09d012616a13ee03354a1480/numpy-1-21-xfails.txt#L99. I don't know where the torch failure is coming from, but since it's in an operator, it should be xfailed too (maybe we should just be skiping all operator tests since we can't do anything about them anyway)

adonath commented 9 months ago

I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to all (it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.

I see you concern here. The "source of truths" is now the __all__ list and not what is implement in for example _aliases.py. However as a first time contributor to this library I find the explicit declaration easier to grasp.

asmeurer commented 9 months ago

I agree the explicit declaration is a lot better. Actually, I would like to use this in the spec itself as well https://github.com/data-apis/array-api/blob/main/src/array_api_stubs/_draft/__init__.py. But some good tooling would really help out here. If anyone knows of a tool that can do it, let me know. Otherwise we should build one, or get the ruff team to write one.

asmeurer commented 9 months ago

But to be clear, I'm OK with doing it this way for now even without this linter, since I don't think it exists yet.

adonath commented 9 months ago

I now sorted all __all__ alphabetically and added ruf check for this as well. This should simplify bookkeeping of those lists.

adonath commented 9 months ago

I don't know if there's some way to tell a linter, "warn me if I have a name in this file that isn't exported". Most linters do that for init.py but I don't know if there's a way to force it to do that for other files as well (and even better if it can read all from init.py but check the names from some other file). Maybe we can just throw together our own custom linter for this. It probably isn't hard, especially if we just import the package and compare dir(submodule) and all. I'll admit the use-case of this library is a little niche. Most packages aren't in the business of re-exporting a namespace.

On a side note: I think this is only partly an issue. Assuming the Array API changes or is extended, this should be reflected in the array api tests. So a missed export should finally get caught while testing. But I agree it would be good to catch it earlier. However I don't seen how to avoid false positives, because e.g. the _aliases.py namespaces is not "clean". It contains a few members and helper functions, which are not meant to be public. Instead of a custom linter one could potentially implement just a unit test as well.

adonath commented 9 months ago

This PR is ready for final review from my side @asmeurer.

asmeurer commented 9 months ago

It looks like I just added some merge conflicts here by merging #76

asmeurer commented 9 months ago

However I don't seen how to avoid false positives, because e.g. the _aliases.py namespaces is not "clean". It contains a few members and helper functions, which are not meant to be public. Instead of a custom linter one could potentially implement just a unit test as well.

The idea is that _aliases.py would also have __all__ defined, just as it is in main. Basically what I want is a linter that makes sure that any __init__.py with __all__ always includes everything in that __all__ that is also in the __all__ of each package it imports from. The programmatic way to do this is to only define __all__ at the bottom levels and use __all__ += submodule.__all__ in the __init__.py files, but the downside of that is that you can't easily see what names are exported just by looking at __init__.py (which is already sort of true anyway in this package because of the re-exports, so maybe it isn't a big deal).

asmeurer commented 9 months ago

I'm also afraid I'm going to break some of this at https://github.com/data-apis/array-api-compat/pull/84. However, if we merge this first I can fix whatever needs to be fixed over there.

adonath commented 8 months ago

The idea is that _aliases.py would also have all defined, just as it is in main. Basically what I want is a linter that makes sure that any init.py with all always includes everything in that all that is also in the all of each package it imports from. The programmatic way to do this is to only define all at the bottom levels and use all += submodule.all in the init.py files, but the downside of that is that you can't easily see what names are exported just by looking at init.py (which is already sort of true anyway in this package because of the re-exports, so maybe it isn't a big deal).

I'm not aware of any linter that would do this. Playing a bit around I think this is roughly what you aim for:

import importlib
from array_api_compat._internal import _get_all_public_members

EXCLUDE = {
    "numpy": {"get_xp", "annotations", "partial", "np"},
    "torch": {"get_xp", "wraps", "annotations", "builtin_all", "builtin_any", "vecdot_linalg"},
}

def check_aliases_consistency(library):
    """Check that the public API of the library is consistent with the aliases."""

    submodule = importlib.import_module(f"array_api_compat.{library}")

    all_public = set(submodule.__all__)

    if hasattr(submodule, "linalg"):
        all_public_linalg = set(submodule.linalg.__all__)
    else:
        all_public_linalg = set()

    aliases = set(_get_all_public_members(submodule._aliases))

    print(aliases.difference(all_public | all_public_linalg | EXCLUDE[library]))

Which shows for example when calling check_aliases_consistency("torch"):

{'UniqueCountsResult', 'UniqueInverseResult', 'UniqueAllResult'}

So these are forgotten exports. However those should definitely show up in the tests later.

asmeurer commented 8 months ago

That looks like it would be a useful script. That already doesn't show up in the tests because the UniqueAllResult name is not part of the standard (the standard only requires the function to return a namedtuple, but the type of that namedtuple is not in the spec namespace). So indeed the UniqueAllResult should be either not exported at all or exported in both files.

In fact, I would be cautious in general about relying on the test suite to catch these mistakes. There's a dozen reasons the test suite might not catch something. Maybe a function is wrapped to fix some behavior that is missed by the test suite. Or (more likely), maybe the test for a function is xfailed for some other reason.

adonath commented 8 months ago

Thanks @asmeurer, @vnmabus , @rgommers and @lucascolley for the comments and review!

data-apis / array-api-compat

Add ruff to ci setup #82