data-apis / array-api-tests

Test suite for the PyData Array APIs standard
https://data-apis.org/array-api-tests/
MIT License
63 stars 39 forks source link

Specify dependency of each test case #51

Open honno opened 2 years ago

honno commented 2 years ago

The test suite is mostly made up of test methods for each function (or array object method) in the spec. The majority of these tests require other functions to do everything we want, which is problematic when an underlying function does not work—you'll get a lot of spammy errors that actually relate to a single fundamental problem... or even false positives.

So I think it would be a good idea if we declared the dependencies of each test method, i.e. the functions we use and assume has correct behaviour. We could use these declarations to create a dependency graph, which could be hooked with pytest to prioritise zero/low-dependency tests first, and by default skipping tests that use functions we've deemed incorrect. This will benefit:

  1. Array API adoptors, who would much more easily see what they should prioritise developing
  2. Us, who'd be able to see areas where we can try to cut down the functions we depend on

Ideas for declaring dependencies per test method:

General challenges would be:

The dependency graph + auto skipping would also allow us to:

honno commented 1 year ago

I still think this would be nice, but I think the test suite could do with some more refinement generally for this to be feasible, and even then doesn't seem to be as big a deal as I made out it out to be. Every time I've seen the test suite used, devs are usually identifying that buggy fundamental components are preventing testing other things... it's just that it probably took them way longer than ideal heh.

Zac-HD commented 1 year ago

The 80/20 of this would be to group tests into core, standard, and then extension-xxx groups - I think this would be fairly easy with a module-level pytestmark = pytest.mark.core (etc). Then if you run into a problem, try pytest -m core and then pytest -m "core or standard" to check whether it's something foundational.

Hypothesis takes a similar approach using directories to organize our selftests (core=conjecture, cover + nocover, extensions, etc.) which works pretty well at higher volumes. Exact dependency tracking is a lovely idea in principle but AFAICT very rare in practice because it's not usually worth the trouble 😅