Closed adonath closed 8 months ago
Thanks for working on this @adonath!
I did a quick browse of the results, and it looks like a few error codes probably should simply be ignored
E402 Module level import not at top of file
: that can't really be avoided probably, when all dependencies / wrapped libraries are optionalF401
numpy.absimported but unused
that's par for the course in __init__.py
files when you're curating a public API to expose.With those two ignored, hopefully it'll be a useful setup to catch other issues that may otherwise slip in.
But I think so far
army-api-compat
does not pre-commit hooks, so I opted for the CI.
Yes, that sounds like the right choice to me. CI is nice, and a lot less invasive than pre-commit.
* `F401 `numpy.abs` imported but unused` that's par for the course in `__init__.py` files when you're curating a public API to expose.
I think the convention in that case is to do explicit re-exports:
from numpy import abs as abs
Oh, good point - if that makes the checker happy, then sure seems good to fix up that way.
Thanks @rgommers and @vnmabus for the comments! @rgommers do you have any preference whether to add the fixes to this PR or an independent one?
Probably this PR, since it's needed to get CI green here? It's still not that much code, so fine to review in one go.
I don't think I've ever seen the import abs as abs
pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.
The correct way to be explicit about exports is to define __all__
. I don't remember if there was a good reason I didn't define __all__
in __init__.py
. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.
And yes, make whatever improvements you need to make in this same PR. The point of this is also to make sure ruff isn't warning about any false positives, and there's no way to tell that unless we see the changes.
The import * error codes also need to be ignored.
I don't think I've ever seen the
import abs as abs
pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.The correct way to be explicit about exports is to define
__all__
. I don't remember if there was a good reason I didn't define__all__
in__init__.py
. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.
Some references: https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-no-implicit-reexport https://github.com/astral-sh/ruff/issues/717
Also in the typing PEP, considering stub files in this case: https://peps.python.org/pep-0484/#stub-files
import x as x
doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand, __all__
actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in _aliases.py
).
This is actually a good example of an issue I have with a lot of linters, especially "code quality" linters. In a sane world, a linter would flag import x as x
as a probable typo. But instead, they are encouraging it.
In a sane world, a linter would flag import x as x as a probable typo
We can do that here if you want, ruff has it. https://docs.astral.sh/ruff/rules/useless-import-alias/#useless-import-alias-plc0414
import x as x
doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand,__all__
actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in_aliases.py
).
The "problem" with __all__
is that it is a separate piece of information that we need to update. It adds repetition in a different part of the file and risks that they get desynchronized. Thus, some library authors prefer not to use it. The import foo as foo
pattern does not have this problem and, although the first time you see it looks admittedly a bit weird, you quickly learn to recognize it.
That said, it __all__
or implicit re-exports (disabling the error code) work for you, then lets go with that.
Re-exports are the whole point of this package. Anyway, the duplication of __all__
isn't a big deal especially if we do have a linter to help us keep things consistent.
Thanks everyone for the comments. I ended up doing more work than initially anticipated, but it's all related. Let me quickly summarize the related changes:
__all__
pattern in the public name spaces, such as numpy/__init__.py
, numpy/inalg.py
etc.__all__
definitions in non-public namespaces such as _aliases.py
and _linalg.py
. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module._get_all_public_members
function to get all public members of an imported module in a uniform way. It is either taken from __all__
if present or created from dir(module)
with excludes. _aliases.py
as well. This has the advantage of just defining the public name space in linalg.py
. No need for any del
statements or similar. And it works the same as in __init__.py
, which also import from .aliases.py
__all__
in the public modules now follows the same scheme everywhere:
*
import of the main namespace. Such as form torch import *
(there we can't avoid the *
import)_get_all_public_members
..common._helpers.py
._aliases.py
.linalg.py
__all__
is build after the same import groups, such that maintenance is a bit easier.I don't really have a preference of __all__
vs the rename during import. I think any reasonable editor (at least vscode does) highlights inconsistencies between __all__
and the imports. Names missing in __all__
will be shown as unused import, while names declared in __all__
which are not defined in the module are also highlighted.
I thus removed all definitions in non-public namespaces such as _aliases.py and _linalg.py. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module.
For me it's a little easier to remember to update __all__
when it's in the same file. But I guess more ideally would be if we could somehow check this automatically with a linter.
I don't know if that's possible.
Explicit imports from ..common._helpers.py
As I noted in a line comment, it would be great if a linter can help us keep them consistent.
I think any reasonable editor (at least vscode does) highlights inconsistencies between all and the imports.
These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.
Names missing in all will be shown as unused import, while names declared in all which are not defined in the module are also highlighted. ... If ruff doesn't do this, we should use a linter that does.
ruff
does this, indeed the initial CI log caught some of these, as discussed at the start of this thread. (unless you were referring to something else?)
For me it's a little easier to remember to update all when it's in the same file.
Yes, but you have to change the import into the public namespace anyway. And if you forget to update __all__
in this file, ruff errors with an unused import. I hope this makes sense.
These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.
In vscode I use pylance
, which shows if undefined names are listed in __all__
. Ruff by default does not seem to do this, but I can check whether there is a config option.
Concerning the type annotations and checking I think there are still two open issues:
Array
and Device
annotations are undefined in common/_helpers.py
. Here it is not 100% clear what to do. Either defined a union of valid arrays, or re-define the type annotations in the get_xp()
for the specific array library.np._CopyMode
annotation in common/_aliases.py
which requires numpy for type checking. But I guess this might be fine...Ruff by default does not seem to do this, but I can check whether there is a config option.
Thanks @lucascolley! I enabled the corresponding option in ruff. For now I only added it as a command line option in the CI, long term it might be better to have ruff config file.
Above it sounded like Aaron wanted PLC0414
too.
There is a test fail in the CI, but it seems unrelated to this PR https://github.com/data-apis/array-api-compat/actions/runs/7702062263/job/20989506932?pr=82#step:6:5798
So just to be clear, is ruff now checking for __all__
consistency across files?
I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to __all__
(it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.
I don't know if there's some way to tell a linter, "warn me if I have a name in this file that isn't exported". Most linters do that for __init__.py
but I don't know if there's a way to force it to do that for other files as well (and even better if it can read __all__
from __init__.py
but check the names from some other file). Maybe we can just throw together our own custom linter for this. It probably isn't hard, especially if we just import the package and compare dir(submodule)
and __all__
. I'll admit the use-case of this library is a little niche. Most packages aren't in the business of re-exporting a namespace.
You can ignore the test failures. Looks like the numpy 1.21 failure needs to be added to this list https://github.com/data-apis/array-api-compat/blob/916a84bafd2ae97c09d012616a13ee03354a1480/numpy-1-21-xfails.txt#L99. I don't know where the torch failure is coming from, but since it's in an operator, it should be xfailed too (maybe we should just be skiping all operator tests since we can't do anything about them anyway)
I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to all (it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.
I see you concern here. The "source of truths" is now the __all__
list and not what is implement in for example _aliases.py
. However as a first time contributor to this library I find the explicit declaration easier to grasp.
I agree the explicit declaration is a lot better. Actually, I would like to use this in the spec itself as well https://github.com/data-apis/array-api/blob/main/src/array_api_stubs/_draft/__init__.py. But some good tooling would really help out here. If anyone knows of a tool that can do it, let me know. Otherwise we should build one, or get the ruff team to write one.
But to be clear, I'm OK with doing it this way for now even without this linter, since I don't think it exists yet.
I now sorted all __all__
alphabetically and added ruf check for this as well. This should simplify bookkeeping of those lists.
I don't know if there's some way to tell a linter, "warn me if I have a name in this file that isn't exported". Most linters do that for init.py but I don't know if there's a way to force it to do that for other files as well (and even better if it can read all from init.py but check the names from some other file). Maybe we can just throw together our own custom linter for this. It probably isn't hard, especially if we just import the package and compare dir(submodule) and all. I'll admit the use-case of this library is a little niche. Most packages aren't in the business of re-exporting a namespace.
On a side note: I think this is only partly an issue. Assuming the Array API changes or is extended, this should be reflected in the array api tests. So a missed export should finally get caught while testing. But I agree it would be good to catch it earlier. However I don't seen how to avoid false positives, because e.g. the _aliases.py
namespaces is not "clean". It contains a few members and helper functions, which are not meant to be public. Instead of a custom linter one could potentially implement just a unit test as well.
This PR is ready for final review from my side @asmeurer.
It looks like I just added some merge conflicts here by merging #76
However I don't seen how to avoid false positives, because e.g. the _aliases.py namespaces is not "clean". It contains a few members and helper functions, which are not meant to be public. Instead of a custom linter one could potentially implement just a unit test as well.
The idea is that _aliases.py
would also have __all__
defined, just as it is in main
. Basically what I want is a linter that makes sure that any __init__.py
with __all__
always includes everything in that __all__
that is also in the __all__
of each package it imports from. The programmatic way to do this is to only define __all__
at the bottom levels and use __all__ += submodule.__all__
in the __init__.py
files, but the downside of that is that you can't easily see what names are exported just by looking at __init__.py
(which is already sort of true anyway in this package because of the re-exports, so maybe it isn't a big deal).
I'm also afraid I'm going to break some of this at https://github.com/data-apis/array-api-compat/pull/84. However, if we merge this first I can fix whatever needs to be fixed over there.
The idea is that _aliases.py would also have all defined, just as it is in main. Basically what I want is a linter that makes sure that any init.py with all always includes everything in that all that is also in the all of each package it imports from. The programmatic way to do this is to only define all at the bottom levels and use all += submodule.all in the init.py files, but the downside of that is that you can't easily see what names are exported just by looking at init.py (which is already sort of true anyway in this package because of the re-exports, so maybe it isn't a big deal).
I'm not aware of any linter that would do this. Playing a bit around I think this is roughly what you aim for:
import importlib
from array_api_compat._internal import _get_all_public_members
EXCLUDE = {
"numpy": {"get_xp", "annotations", "partial", "np"},
"torch": {"get_xp", "wraps", "annotations", "builtin_all", "builtin_any", "vecdot_linalg"},
}
def check_aliases_consistency(library):
"""Check that the public API of the library is consistent with the aliases."""
submodule = importlib.import_module(f"array_api_compat.{library}")
all_public = set(submodule.__all__)
if hasattr(submodule, "linalg"):
all_public_linalg = set(submodule.linalg.__all__)
else:
all_public_linalg = set()
aliases = set(_get_all_public_members(submodule._aliases))
print(aliases.difference(all_public | all_public_linalg | EXCLUDE[library]))
Which shows for example when calling check_aliases_consistency("torch")
:
{'UniqueCountsResult', 'UniqueInverseResult', 'UniqueAllResult'}
So these are forgotten exports. However those should definitely show up in the tests later.
That looks like it would be a useful script. That already doesn't show up in the tests because the UniqueAllResult name is not part of the standard (the standard only requires the function to return a namedtuple, but the type of that namedtuple is not in the spec namespace). So indeed the UniqueAllResult should be either not exported at all or exported in both files.
In fact, I would be cautious in general about relying on the test suite to catch these mistakes. There's a dozen reasons the test suite might not catch something. Maybe a function is wrapped to fix some behavior that is missed by the test suite. Or (more likely), maybe the test for a function is xfailed for some other reason.
Thanks @asmeurer, @vnmabus , @rgommers and @lucascolley for the comments and review!
As proposed in #73 I'm adding a CI check that runs ruff as a linter. It currently runs as an "allowed failure". I can adapt the configuration as needed and optionally add ruff as a pre-commit hook. But I think so far
array-api-compat
does not pre-commit hooks, so I opted for the CI.