a-recknagel / stenotype

Support for shorthand type annotations.
MIT License
3 stars 2 forks source link

Syntax Extension: Generics #11

Open maxfischer2781 opened 5 years ago

maxfischer2781 commented 5 years ago

Generics consist of two parts:

This is a follow-up on #5 to not delay the core syntax needlessly.

maxfischer2781 commented 5 years ago

I don't see what can be done about Generic classes - the [] syntax is already pretty lean. Other languages merely use other symbols, such as List<T>. No advantage in using something else just because.

The only thing that IMO could be simplified is the class definition:

class Foo(bases, Generic[T]):      # typing -- Generic base class is parameterised by TypeVar
class Foo[T](bases):               # parameterise class object (matches usage)
class Foo(bases): [T]              # dangling parameters (valid Py3 syntax)

But I don't see how removing a few characters is significant if it is followed by an entire class definition.


The TypeVar I see some benefit in using actual notation. At the moment, they require declaration before use:

T = TypeVar('T')
S = TypeVar('S', str, bytes)
U = TypeVar('U', bound=str)
a-recknagel commented 5 years ago

No advantage in using something else just because.

I don't have an informed opinion on the matter, but I just wanted to point out that it might be a nice property of stenotype if [] and {} were only used for list and dict annotation. I'd be in favor of using <> for generics, even though i see that breaking with the existing notation of [] is a serious counter point.

Shortening TypeVar

The best I can come up with is just something to circumvent importing typing, both methods are hard to refactor though:

"T, str or bytes"  # stenotype
# or
stenotype.TypeVar("T, str or bytes")

It might be safe enough to skip the comment in the first example and just search for unassigned strings that are parseable, but it feels risky. Having good error messages might make it viable.

variance

You proposed it in #5 already, I'll mention it here once more in my own words just to make sure that I understood it in the way you meant it:

covariance and contravariance don't need to be part of the TypeVar definition and should instead be added in the actual annotation via + and -, e.g.

"T, str or bytes"  # stenotype
foo: "[+T]"
maxfischer2781 commented 5 years ago

covariance and contravariance don't need to be part of the TypeVar definition and should instead be added in the actual annotation via + and -

Yes, I think that is sufficient. For example, Scala (section 4.5) only has the variance annotation in the class, not necessarily methods. We could have variance annotations required only top-level:

class Foo(Generic["+T"]):
    def bar(self, b: T):  # T is known to be covariant
        ...

But we can also go with variance being part of the TypeVar, e.g. T and +T are separate TypeVars. That would be simpler to process, definitely for stenotype but likely also for humans.


<> Generics

I see your point. I think for just syntax, breaking consistency is not justifiable. One could however justify <> as TypeVar declarations. For example, a TypeScript/Python mashup would define a generic function as such:

def identity<T>(arg: T) -> T:
    return arg

The major problem is that we cannot insert <> at function and class definitions without breaking syntax. One could use something similar to C++ templates, where a template declaration precedes the object:

# type: '<T>'
def identity(arg: T) -> T:
    return arg

Shortening TypeVar

The base use-case would be for stenotype to silently create the TypeVar definition as needed.

stenofoo: "T or int"
# expanded to...
T = TypeVar('T')
stenofoo: Union[T, int]

mypy reports redefined types, so if there is a clash (T as automatic TypeVar and manual class) it does not slip through silently. Common TypeVar names are also not common class names, as far as I can tell.


TypeVar Infodump

Identifying TypeVars

The straightforward approach is to have a rule for TypeVars similar to Literals - e.g. "a single uppercase letter". That covers most TypeVars I've seen so far, but not all - e.g. KT for keytype in typing.

Some languages have an explicit keyword for type vars. For example a Julia uses a where clause. This could look like (T, A) where T or def foo(z: T, a: A): # type: where T <: int.

Alternatively, we could use a symbol to denote TypeVars. For example, :T, `T, .T, <T> or similar.

Finally, a TypeVar could always be bound/constraint, e.g. T <: Any, T <: int, or with variance, e.g. +T, -T or =T. Basically

Constrained/Bound Typevars

Constraints and bounds are closely related. A TypeVar('N', bound=int) defines N that is one well-defined subclass of int. A TypeVar('R', int, float) defines R that is one well-defined class of either int or float. My tests with mypy and PyCharm show that constraints also accept subclasses. So basically TypeVar('R', int, float) == TypeVar('R', bound=Union[int, float]). It would make sense to just merge the two for stenotype.

Mainstream Type notation uses S <: C to denote "C is the upper bound of S" or issubclass(S, C) in Python speak. This would translate to N <: int and R <: int or float in the above TypeVars.

I am ambivalent on using <: - it makes sense with a mathematical background and can be learned, but is quite obscure without prior knowledge. However, I don't see many alternatives. Using Pythonish issubclass(T, int or float) is a beast, T is int or float seems clever and slightly wrong, and T is_a int or T is a int is very awkward.

Variance

Constraints/bounds define what type a T can be. Variance defines what other type a T that is already a specific type can be as well. This is expressed whether subtypes (covariance) or supertypes (contravariance) are accepted in place of a specific T.

Mainstream Type notation seems to just use +T (covariant T) and -T (contravariant T). mypy also displays TypeVars like this.

a-recknagel commented 5 years ago

But we can also go with variance being part of the TypeVar

I was initially just +1 on this, but the more I think about this the more I feel that it might be better to bind variance on declaration and not usage. After all, a TypeVar is usually scoped to a class/function and only valid in that context. One wouldn't declare it on module level and then use it sometimes with +T and sometimes with -T. But I would be fine either way.

Constrained/Bound Typevars

I'm fine with <:, even if it is something new you'd have to learn. It would make TypeVar definitions easily identifiable and searchable.

Identifying TypeVars

Just to be on the same page for this discussion, and owing to the fact that I'm still not well versed in Generics-usage, we are searching for something that will ultimately translate to

T = typing.TypeVar('T', someType, someOtherType, covariant=True)
def identity(arg: T) -> T:
    return arg

, right?

I would prefer to avoid declaring TypeVars in a comment, for technical reasons. It would be one less source for bugs if all the handling of semantics could be kept int the part of the code that looks at the ast, and the only information that a tokenize feeds into the grammar would be related to ignoring/forcing commented lines, and nothing subtle related to meaning.

An obvious alternative would be to just have it be an unassigned string then, i.e. # type: '<T>' would be "<T>", at which point it could also be moved to the class' or function's docstring:

def identity(arg: "T") -> "T":
    """Simple identity function.

    Generics:
        +T <: someType or someOtherType

    Returns:
        The instance that was passed in.

    """
    return arg

What do you think about that? The assumption here is that someone who wants to annotate generic types would be fine writing some docstrings, too.

maxfischer2781 commented 5 years ago

One wouldn't declare it on module level and then use it sometimes with +T and sometimes with -T.

My idea is that iff TypeVars are declared by stenotype, not the programmer, then it can declare separate TypeVars per name+variance. For example:

# source steno
def foo(a: T, b: +T): ...
# expanded typing
__T = TypeVar('__T')
__CovT = TypeVar('__CovT', covariant=True)
def foo(a: __T, b: __CovT): ...

What I am not sure is if that is a practical strategy with bounds/constraints as well. E.g. how would we deal with def foo(a: T <: int, b: T):? Would this be the same as def foo(a: T <: int, b: T <: int): or def foo(a: T1 <: int, b: T2):?

This is what other languages use a where clause for. Or do we just allow one TypeVar name, with one bound/constraint and one variance per function/class?

def foo(a: +T <: int, b: T, c: -T, d: T <: float, e: T <: int): ...
#           ^declaration ^ ok  ^ error  ^ error        ^ ok (consistent)

Just to be on the same page

Yes, TypeVars would be declared by stenotype when expanding.

I would prefer to avoid declaring TypeVars in a comment,

The typed_ast supports # type: Foo comments and is part of the 3.8 ast module. I don't know if all objects that we need could be annotated -- PEP 484 does not cover all Python constructs.

it could also be moved to the class' or function's docstring

That may get us back to the initial syntax problem -- which doctoring format? Several tools, e.g. sphinx, expect to be able to parse types in docstrings...

a-recknagel commented 5 years ago

I pushed a branch 8d6d634df1e0cacf2f8d95e7c2b2e15c57a29e1f, which is just this file where I wanted to collect some examples. If you can push directly to it (you should be able to), then you can add/alter cases and fix anything I got wrong (looking at the" typing-comment" section in particular).

typed-ast

Neat, I didn't know that.

TypeVar in docstring

It doesn't solve the problem of how the annotation should look, that's true. But I imagined two nice unrelated properties with it:

The typing: ... comments feel more module-level to me, I don't know if it would feel right if declaring them above a class/function should assign them to said construct.

I'd need to also look more into how sphinx parses code files to have an opinion on it. My intuition is that either 1) it doesn't work at all, or 2) it works, but TypeVar annotations will look like __CovT. Both would require writing a plugin, and having a section with all relevant info in the docstring might even be handy in that case.