Open maxfischer2781 opened 5 years ago
I don't see what can be done about Generic
classes - the []
syntax is already pretty lean. Other languages merely use other symbols, such as List<T>
. No advantage in using something else just because.
The only thing that IMO could be simplified is the class definition:
class Foo(bases, Generic[T]): # typing -- Generic base class is parameterised by TypeVar
class Foo[T](bases): # parameterise class object (matches usage)
class Foo(bases): [T] # dangling parameters (valid Py3 syntax)
But I don't see how removing a few characters is significant if it is followed by an entire class definition.
The TypeVar
I see some benefit in using actual notation. At the moment, they require declaration before use:
T = TypeVar('T')
S = TypeVar('S', str, bytes)
U = TypeVar('U', bound=str)
T = TypeVar('T')
declaration could do well with shorting -- we just want a single T
.TypeVar
are non-obvious - is TypeVar('S', str, bytes) == TypeVar('S', bound=Union[str, bytes])
? covariant
and contravariant
are also correct in the strictest sense - i.e. not obvious even though they express simple concepts.No advantage in using something else just because.
I don't have an informed opinion on the matter, but I just wanted to point out that it might be a nice property of stenotype
if []
and {}
were only used for list and dict annotation. I'd be in favor of using <>
for generics, even though i see that breaking with the existing notation of []
is a serious counter point.
Shortening
TypeVar
The best I can come up with is just something to circumvent importing typing
, both methods are hard to refactor though:
"T, str or bytes" # stenotype
# or
stenotype.TypeVar("T, str or bytes")
It might be safe enough to skip the comment in the first example and just search for unassigned strings that are parseable, but it feels risky. Having good error messages might make it viable.
variance
You proposed it in #5 already, I'll mention it here once more in my own words just to make sure that I understood it in the way you meant it:
covariance
and contravariance
don't need to be part of the TypeVar
definition and should instead be added in the actual annotation via +
and -
, e.g.
"T, str or bytes" # stenotype
foo: "[+T]"
covariance
andcontravariance
don't need to be part of the TypeVar definition and should instead be added in the actual annotation via+
and-
Yes, I think that is sufficient. For example, Scala (section 4.5) only has the variance annotation in the class, not necessarily methods. We could have variance annotations required only top-level:
class Foo(Generic["+T"]):
def bar(self, b: T): # T is known to be covariant
...
But we can also go with variance being part of the TypeVar, e.g. T
and +T
are separate TypeVars. That would be simpler to process, definitely for stenotype
but likely also for humans.
<>
Generics
I see your point. I think for just syntax, breaking consistency is not justifiable. One could however justify <>
as TypeVar declarations. For example, a TypeScript/Python mashup would define a generic function as such:
def identity<T>(arg: T) -> T:
return arg
The major problem is that we cannot insert <>
at function and class definitions without breaking syntax. One could use something similar to C++ templates, where a template declaration precedes the object:
# type: '<T>'
def identity(arg: T) -> T:
return arg
Shortening TypeVar
The base use-case would be for stenotype
to silently create the TypeVar
definition as needed.
stenofoo: "T or int"
# expanded to...
T = TypeVar('T')
stenofoo: Union[T, int]
mypy
reports redefined types, so if there is a clash (T
as automatic TypeVar and manual class) it does not slip through silently. Common TypeVar names are also not common class names, as far as I can tell.
Identifying TypeVars
The straightforward approach is to have a rule for TypeVars similar to Literals - e.g. "a single uppercase letter". That covers most TypeVars I've seen so far, but not all - e.g. KT
for keytype in typing
.
Some languages have an explicit keyword for type vars. For example a Julia uses a where
clause. This could look like (T, A) where T
or def foo(z: T, a: A): # type: where T <: int
.
Alternatively, we could use a symbol to denote TypeVars. For example, :T
, `T
, .T
, <T>
or similar.
Finally, a TypeVar could always be bound/constraint, e.g. T <: Any
, T <: int
, or with variance, e.g. +T
, -T
or =T
. Basically
Constrained/Bound Typevars
Constraints and bounds are closely related. A TypeVar('N', bound=int)
defines N
that is one well-defined subclass of int
. A TypeVar('R', int, float)
defines R
that is one well-defined class of either int
or float
. My tests with mypy
and PyCharm show that constraints also accept subclasses. So basically TypeVar('R', int, float) == TypeVar('R', bound=Union[int, float])
. It would make sense to just merge the two for stenotype
.
Mainstream Type notation uses S <: C
to denote "C
is the upper bound of S
" or issubclass(S, C)
in Python speak. This would translate to N <: int
and R <: int or float
in the above TypeVars.
I am ambivalent on using <:
- it makes sense with a mathematical background and can be learned, but is quite obscure without prior knowledge. However, I don't see many alternatives. Using Pythonish issubclass(T, int or float)
is a beast, T is int or float
seems clever and slightly wrong, and T is_a int
or T is a int
is very awkward.
Variance
Constraints/bounds define what type a T
can be. Variance defines what other type a T
that is already a specific type can be as well. This is expressed whether subtypes (covariance) or supertypes (contravariance) are accepted in place of a specific T
.
Mainstream Type notation seems to just use +T
(covariant T) and -T
(contravariant T). mypy
also displays TypeVars like this.
But we can also go with variance being part of the
TypeVar
I was initially just +1 on this, but the more I think about this the more I feel that it might be better to bind variance on declaration and not usage. After all, a TypeVar
is usually scoped to a class/function and only valid in that context. One wouldn't declare it on module level and then use it sometimes with +T
and sometimes with -T
. But I would be fine either way.
Constrained/Bound
Typevars
I'm fine with <:
, even if it is something new you'd have to learn. It would make TypeVar
definitions easily identifiable and searchable.
Identifying
TypeVars
Just to be on the same page for this discussion, and owing to the fact that I'm still not well versed in Generics-usage, we are searching for something that will ultimately translate to
T = typing.TypeVar('T', someType, someOtherType, covariant=True)
def identity(arg: T) -> T:
return arg
, right?
I would prefer to avoid declaring TypeVars in a comment, for technical reasons. It would be one less source for bugs if all the handling of semantics could be kept int the part of the code that looks at the ast
, and the only information that a tokenize
feeds into the grammar would be related to ignoring/forcing commented lines, and nothing subtle related to meaning.
An obvious alternative would be to just have it be an unassigned string then, i.e. # type: '<T>'
would be "<T>"
, at which point it could also be moved to the class' or function's docstring:
def identity(arg: "T") -> "T":
"""Simple identity function.
Generics:
+T <: someType or someOtherType
Returns:
The instance that was passed in.
"""
return arg
What do you think about that? The assumption here is that someone who wants to annotate generic types would be fine writing some docstrings, too.
One wouldn't declare it on module level and then use it sometimes with +T and sometimes with -T.
My idea is that iff TypeVars are declared by stenotype
, not the programmer, then it can declare separate TypeVars per name+variance. For example:
# source steno
def foo(a: T, b: +T): ...
# expanded typing
__T = TypeVar('__T')
__CovT = TypeVar('__CovT', covariant=True)
def foo(a: __T, b: __CovT): ...
What I am not sure is if that is a practical strategy with bounds/constraints as well. E.g. how would we deal with def foo(a: T <: int, b: T):
? Would this be the same as def foo(a: T <: int, b: T <: int):
or def foo(a: T1 <: int, b: T2):
?
This is what other languages use a where
clause for. Or do we just allow one TypeVar name, with one bound/constraint and one variance per function/class?
def foo(a: +T <: int, b: T, c: -T, d: T <: float, e: T <: int): ...
# ^declaration ^ ok ^ error ^ error ^ ok (consistent)
Just to be on the same page
Yes, TypeVars would be declared by stenotype when expanding.
I would prefer to avoid declaring TypeVars in a comment,
The typed_ast supports # type: Foo
comments and is part of the 3.8 ast
module. I don't know if all objects that we need could be annotated -- PEP 484 does not cover all Python constructs.
it could also be moved to the class' or function's docstring
That may get us back to the initial syntax problem -- which doctoring format? Several tools, e.g. sphinx, expect to be able to parse types in docstrings...
I pushed a branch 8d6d634df1e0cacf2f8d95e7c2b2e15c57a29e1f, which is just this file where I wanted to collect some examples. If you can push directly to it (you should be able to), then you can add/alter cases and fix anything I got wrong (looking at the" typing-comment" section in particular).
typed-ast
Neat, I didn't know that.
TypeVar in docstring
It doesn't solve the problem of how the annotation should look, that's true. But I imagined two nice unrelated properties with it:
The typing: ...
comments feel more module-level to me, I don't know if it would feel right if declaring them above a class/function should assign them to said construct.
I'd need to also look more into how sphinx parses code files to have an opinion on it. My intuition is that either 1) it doesn't work at all, or 2) it works, but TypeVar annotations will look like __CovT
. Both would require writing a plugin, and having a section with all relevant info in the docstring might even be handy in that case.
Generics consist of two parts:
List[int]
,Dict[str, str]
,Iterable[List[T]]
.List[T]
,(a: int, b: B) -> R
.This is a follow-up on #5 to not delay the core syntax needlessly.