data-apis / array-api

RFC document, tooling and other content related to the array API standard
https://data-apis.github.io/array-api/latest/
MIT License
215 stars 46 forks source link

Some issues with type annotations #143

Closed asmeurer closed 3 years ago

asmeurer commented 3 years ago

Some issues with type annotations that were brought up by @Bvb93 at https://github.com/numpy/numpy/pull/18585

rgommers commented 3 years ago

Thanks for the great feedback @BvB93, and for collecting it here @asmeurer. I'll try and fix the straightforward ones now.

The pipe operator for types isn't valid yet (https://www.python.org/dev/peps/pep-0604/) (it's used in the NestedSequence type in asarray)

It's valid in Python 3.8 with from __future__ import annotations. And py38 is our minimum version because of positional-only parameters, so we're fine here.

unique can be implemented as an @overload (see numpy/numpy#18585 (comment)). I don't know if this needs to be changed in the spec.

No, I don't think that should be changed in the spec, that would just make it harder to read. It's an implementation issue.

The return type for finfo as finfo doesn't make sense if finfo is a function (see numpy/numpy#18585 (comment)).

The return type isn't relevant, it must be an object with the minimum properties. So the suggestion of using a typing.Protocol is a good one. I'll think about the best way to include that in the standard.

Some of the other types for properties also use <...> types. We should make all the type annotations be valid Python/mypy that can be used in the annotations in the signature.

No, this is on purpose. We cannot have valid types for library-specific classes/objects. <array> means "the conforming array type in the library". We can add a typing.Protocol in the NumPy implementation, and that will then very likely get reused in other libraries. But in a document that doesn't make sense.

Should the array object subclass from typing.Protocol (numpy/numpy#18585 (comment))? (I don't know if this is relevant for the spec).

Same answer, yes in implementation, no in spec.

rgommers commented 3 years ago

For asarray, it isn't clear how to define NestedSequence (typing annotations don't support recursive types).

Also an implementation issue. I'd simply type it as Sequence[Any] for now, given that there's no canonical way to do it. Note that for NumPy, there's already numpy.typing._NestedSequence which has a similar issue.

rgommers commented 3 years ago

asarray needs to also include array as a valid type for obj (array should not be considred a subtype of NestedSequence)

Already covered by SupportsDLPack, but can add explicitly and specify it's a no-op in that case.

EDIT: after reviewing, that actually makes it less clear. It should be treated just like other objects that are covered by SupportsDLPack - e.g. copy=False shares the data, but does create a new array instance (rather than returning the existing input instance).

rgommers commented 3 years ago

For stack and concat (and possibly others), Tuple is too restrictive vs. Sequence.

I'm actually not sure Sequence is a good idea. It can be quite hard to deal with this correctly in both Python and C/C++, see e.g. https://stackoverflow.com/questions/43566044/what-is-pythons-sequence-protocol. I remember a lot of cases where we special-cased tuple and list inputs, and other kinds of sequences were doing the wrong thing.

I can be convinced that accepting Union[Tuple[<array>, ...], List[<array>]] is a good idea, but I don't think Sequence is.

rgommers commented 3 years ago

The return type for finfo as finfo doesn't make sense if finfo is a function (see numpy/numpy#18585 (comment)).

The return type isn't relevant, it must be an object with the minimum properties. So the suggestion of using a typing.Protocol is a good one. I'll think about the best way to include that in the standard.

The annotation wasn't finfo, it was <class>. To further clarify I changed it to <finfo object>.

asmeurer commented 3 years ago

Another issue I just noticed. The dunder methods (__add__, and so on) should include int, float, (and bool as appropriate) as allowed types for other.

BvB93 commented 3 years ago

Continuing from https://github.com/numpy/numpy/pull/18585#discussion_r671555487:

Well I've also been trying to update the type hints in the spec itself concurrently with my updates on this PR based on your feedback, so this is still a relevant question. Should we change the type of stream in the spec from Optional[int, Any] to just Optional[Any]? I admit I'm still a bit confused by Any. The Python docs state "A special kind of type is Any. A static type checker will treat every type as being compatible with Any and Any as being compatible with every type.", so wouldn't Union[int, Any] or indeed Optional[Any] just be equivalent to just Any as far as type checking is concerned? It seems to me that the only point of Optional[int, Any] would be to document that None and int have special meaning, but based on what you've stated here, this is perhaps just me misunderstanding how Any works with respect to covariance and contravariance.

@asmeurer when used as parameter type, all unions involving Any are indeed equivalent to a Any (from a typing perspective). I made a similar comment in https://github.com/data-apis/array-api/pull/185#discussion_r638201753, but the more verbose union was preferred then for the sake of clarity.

I'm also a bit confused by the Any type for this specific dlpack signature. Is it really Any, or should it be an unspecified type, similar to what is used for device? Perhaps we should move this discussion over to data-apis/array-api#143.

The __dlpack__ protocol is currently in a unique situation as the parameter type of stream is variable, i.e. different concrete implementations are allowed to specify a bound (as long as this bound involves None somehow). This is similar to, e.g., builtin pythons' __getitem__ protocol, where both the key- and return-type are variables.

So no, the most correct type for the more general __dlpack__ spec would not be Any, it would be a typevar: None | ~T. For concrete implementations would narrow down to the like of None (like for numpy), None | int or None | int | FancyStreamType.

Examples

An example of __dlpack__ represented as a typing.Protocol subclass.

from typing import TypeVar, Protocol, Any

PyCapsule = Any  # TODO
_T_contra = TypeVar("_T_contra", contravariant=True)

class SupportsDLPack(Protocol[_T_contra]):
    def __dlpack__(self, /, *, stream: None | _T_contra = ...) -> PyCapsule: ...
BvB93 commented 3 years ago

Two problems recently discovered in https://github.com/numpy/numpy/pull/19969:

kgryte commented 3 years ago

As all items in this tracking issue have been addressed, will close. Any further issues can be discussed in a new issue and resolved in follow-up PRs.