Closed honno closed 3 years ago
For some more context on this. @Zac-HD and I discussed this a bit at some point last year, when I started working on the array API test suite. At the time Zac was open to the idea, but it hasn't yet been implemented. I have since developed quite a bit of the array API test suite, which uses hypothesis extensively. However, the parts of the suite that generate arrays currently only generate constant arrays, because the arrays() strategy hard-codes NumPy. We do not want NumPy to be a dependency of the test suite (actually, it unfortunately currently is because we use the mutually_broadcastable_arrays
strategy). I also was not able to just copy and modify the arrays
code into the test suite because of licence differences. So at present, allowing the arrays
and mutually_broadcastable_arrays
strategies be able to be array API independent, and not import NumPy unless NumPy is the array library that is being used, would directly help the array API test suite. But more broadly, support for this would allow people to use hypothesis with a large number of popular libraries like PyTorch, Tensorflow, Jax, CuPy, Dask, etc.
For those strategies that would be used in the array API test suite (arrays()
in particular), we need to be careful to not use any APIs that aren't part of the array API specification, as that would defeat the whole purpose of using it in the array API test suite. The good news here is, for the dtypes and indexing strategies, the array API test suite does not use the ones in hypothesis.extras.numpy at all. This is because the array API spec has a very limited set of dtypes and specifies a very limited subset of required indexing semantics, so I have instead built very carefully handcrafted strategies that exactly match the array API spec. So outside of basically arrays()
, a more pragmatic approach may be needed for the time, given that no library presently supports the array API specification 100%. This may include, for instance, special casing behaviors and APIs for specific libraries. The array API specification also has nothing to say about several things in the current hypothesis.extra.numpy module, e.g., string dtypes are currently not mentioned at all in the array API spec. It may make sense to limit those to just NumPy for now.
I'd be very happy to ship a (e.g.) hypothesis.extra.array_api
module - the standard is a very exciting development, and I'd love Hypothesis to have great support and help library maintainers and consumers to adopt it.
hypothesis-array-api
package will be more flexible for users than an explicitly-experimental module in Hypothesis - to avoid forcing updates just to get a working version of these strategies in future.array_module
constant; it seems likely that this would make differential testing of multiple modules pretty awkward. Perhaps a function get_strategies_namespace(array_module)
, returning a SimpleNamespace
or similar of functions-returning-strategies with the array_module bound in?CC @rsokl; I know you're busy but probably also interested.
Glad to hear this could see a future inside Hypothesis :)
Regarding stability, having an external package coexist sounds good. My impression has been that array creation via asarray()
and how it allows for nested sequences of Python builtins is rater critical for an arrays()
strategy and is something fortunately well agreed upon, but there'll be odd uncertainties like data-apis/array-api#152 which warrants a flexible external package.
And yeah the array_module
constant is awkward, I will play around with a "register mechanism" with a get_strategues_namespace()
-ish method and get feedback from folk like @asmeurer who would be using these Array API strategies.
I'll be figuring out the implementation details for now externally but I will get to work on a hypothesis.extra.array_api
PR at some point... maybe I'll have something ready for review end of August. I'll of course be watching this issue if there is any more input until then.
Hello folks, I want to ascertain whether Hypothesis is interested in having generalised "library-agnostic" strategies for the Array API libraries (NumPy, TensorFlow, PyTorch, MXNet, JAX, Dask & CuPy are listed as primary stakeholders). If so I would be up to implement such strategies and open a PR in a few weeks, but I would need guidance.
I've been developing such strategies at honno/hypothesis-array-api with heavy reference to
hypothesis.extra.numpy
and the related internal test suite. These strategies have no dependencies and just assume an Array API-compliant has been monkey-patched to the variablearray_module
.No library 100% adopts the standard right now (NumPy is getting close numpy/numpy#18585) but only using some key parts of the API should get us a powerful
arrays()
strategy. In areas of non-compliance I've been throwing errors on missing required attributes/methods and warning users when we can still generate some things. For example PyTorch doesn't support all the unsigned integers specified in the Array API (onlyuint8
) , so if a user uses theunsigned_integers_dtypes()
strategy (with no arguments) only theuint8
dtype will be generated and the user is warned that the dtypesuint16
,uint32
&uint64
are not available.My biggest concern is how users should tell Hypothesis what Array API library to use. My current plan is to have the array module as an optional kwarg in the strategies, which if not specified defaults to a global variable specified by a
register_array_module()
method.Do note that the limited feature set of the API means functionality from
hypothesis.extra.numpy
could not be achieved through a library-agnostic approach. Additionally helpful properties of the NumPy strategies, such asarray_shapes()
not accepting dimensions above 32 due to NumPy's limits, would either require some checks on runtime or be a nicety scrapped altogether... maybe a purely library-agnostic approach first would let us determine if library-specific checks could be included nicely or not. My first thought is just keeping thenumpy
extra as-is and having anarrays
submodule or something be standalone.So yeah, I'm interested to hear if these strategies could see a future inside Hypothesis, and otherwise I'd generally appreciate input. My priority is to make honno/hypothesis-array-api feature complete and emulate
hypothesis.extra.numpy
concepts like fill values inarrays()
, and then if appropriate I'll work on a PR.Please ask me any questions or if you need clarification on something! I'll cc @asmeurer as they tasked me to create library-agnostic Array API strategies to extend the use of Hypothesis in the Array API's compliance suite data-apis/array-api-tests and may have some ideas.