data-apis / array-api-tests

Test suite for the PyData Array APIs standard
https://data-apis.org/array-api-tests/
MIT License
64 stars 41 forks source link

Loosely assert for functions that produce ints but internally might use floats #154

Open tylerjereddy opened 2 years ago

tylerjereddy commented 2 years ago

It looks like the test_trunc reference implementation is math.trunc for at least a subset of the checks. I noticed that for some large values there is a mismatch with both the pykokkos (ultimately C++) version of trunc and math.trunc, which causes the conformance test to fail. I see the same thing in NumPy as well--do you have a suggestion on how I should proceed here? Is this effectively related to arbitrary precision Python integers, etc.?

I'll just use NumPy rather than pykokkos below, to keep things familiar:

>>> import math
>>> import numpy as np
>>> math.trunc(9007199254740993)
9007199254740993
>>> np.trunc(9007199254740993)
9007199254740992.0

And then in the conformance testing: pytest array_api_tests/test_operators_and_elementwise_functions.py::test_trunc

AssertionError: out[0]=9007199254740992, but should be trunc(x[0])=9007199254740993 [trunc()]

Would you say that my reference implementation in pykokkos and the exposed NumPy trunc are both wrong here, and the test suite is correct or?

honno commented 2 years ago

Thanks for raising this. I think clearly this is an edge case that isn't going to be ran into much outside of Hypothesis-land, and might even be expected due to performance and/or maintainability concerns. Thus we should be ignoring or loosely asserting these things—will have a look this week to update test_trunc so it doesn't error out for pykokkos like this, should be simple.

We've done similar things before (e.g. #141). I should probably note in the README that we want to actively avoid tripping adopters such as yourself with these pedantic Hypothesis examples :sweat_smile:

tylerjereddy commented 2 years ago

Sorry if this is obvious in the hypothesis config docs, but is there a reasonable way to set up a config file that would scope the cases a bit on a per-test basis or not really? That way I could occasionally suppress such cases until they get officially suppressed upstream. That said, I'm not sure this will be common enough that I'll need that anyway.

honno commented 2 years ago

Sorry if this is obvious in the hypothesis config docs, but is there a reasonable way to set up a config file that would scope the cases a bit on a per-test basis or not really? That way I could occasionally suppress such cases until they get officially suppressed upstream. That said, I'm not sure this will be common enough that I'll need that anyway.

Nothing to skip/filter examples unfortunately, just a mechanism to skip tests fully I suppose you've seen already.

asmeurer commented 2 years ago

Getting that level of configurability would be nontrivial. We'd need to allow people to specify custom @given decorators for a given test function, which would also imply making our custom hypothesis strategies part of a public API. It's not something I think we want to do unless a really strong use-case for it comes up.

In this case, the spec says that for integer dtypes, trunc should be a no-op (https://data-apis.org/array-api/latest/API_specification/generated/signatures.elementwise_functions.trunc.html). Note that this is different from np.trunc, which casts integer dtypes to float64 (https://numpy.org/doc/stable/reference/array_api.html#elementwise-functions-differences).

tylerjereddy commented 2 years ago

Interesting, thanks!

tylerjereddy commented 2 years ago

As an aside, thinking about your response a bit more, I don't actually see any evidence that the cases flushed through for ceil, floor, round, and trunc include floating-point types. I wrote some inner loops for the floating point types, but looks like I can pass the API test suite without hitting those code paths.

honno commented 1 year ago

Something @asmeurer caught

>>> torch.arange(9132051521638391890, 0, -91320515216383920)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: invalid size, possible overflow?

So I think even loosely asserting is a bad idea for these kinds of examples given libraries might just raise if they can't produce accurate results. Now I think they should just be filtered completely.

asmeurer commented 1 year ago

It's not completely clear to me what's going on in that pytorch example. It might be due to use of floats internally or it could be a bug. The integers there are already well above the range that can be represented by float64, so if it was converting them internally I expect we would have gotten a different error from the test for that as well.