Open nordlander opened 10 months ago
Great proposal, clearly described. In particular, thinking of Maybe as an unboxed type is very helpful.
I haven't thought about the challenges in implementing this, but if Johan has ideas (with the help of Launchbury/Peyton Jones), I am all for.
Only one minor comment: as mentioned at the end, the most common situation when programmer will meet the Maybe constructor is in signatures as
fun : (a: int, b: Maybe[str], c: Maybe[float?]) -> t
and here it seems that Optional would be a more natural name. We have used "Optional" when talking about the union with None, but maybe we could adapt to a change?
Optional is definitely better than Maybe. Perhaps Opt could also be considered, a short name might have benefits.
We can probably learn to distinguish this "unboxed" optional type from the regular union with None. But it's interesting to note that once we have unboxed values in the language, we'll have to cope with a similar distinction between 'int' and the unboxed integer type, between 'float' and unboxed floats, etc. The only difference being that the unboxed ints and floats can mostly be kept away from normal programmers, whereas the "unboxed optional" will be visible in the signatures of every function that supports default arguments.
Perhaps that's a case for inventing some special syntax for the unboxed optional? Like the ?t notation we already have for the (planned) union with None...
Perhaps that's a case for inventing some special syntax for the unboxed optional? Like the ?t notation we already have for the (planned) union with None…
Or use the ?t notation for the unboxed optional and use the (forthcoming) general union notation for union with None without any shorthand for that case.
Björn
I'd really prefer if we can explain all of this to someone that doesn't know what boxed values are. I don't think that really changes the proposal in itself but somehow I feel it's an important distinction to make when we are talking about the softer properties of this proposal.
For example, the NULL side of this (the actual unboxed NULL) will be used as the sentinel value to actually stop iteration. However, I think it is easier to explain this to a novice user that an iterator should mark the stop of the iteration by doing return StopIteration
where StopIteration
is just a friendly named object which is really just NULL
. I think that's easier to understand than trying to differentiate between two "nothing" values (None
and NULL
). While types might be something most people can grasp, I think explaining boxed values is a level deeper and I think it would be really good if we could explain how to use the language without the users having to take a CS course first.
Do you see what I mean? It's mostly in how we refer to this and explain it, not necessarily of how it works.
My mental model of this is more like:
None
for when the application developer wants to signify nothingNULL
as more of a language level control flow thingy which most people shouldn't need to think of so muchI reckon we want to use the shorthand ?t
for the more common None
that the application developers use. Consider the four cases:
And our current view:
return None
StopIteration
So there are really few cases when we would actually need to write Maybe
, so it's better to use it for None
.
I agree. And there's actually no need to talk about unboxed types when discussing Maybe/Optional in the documentation, just the fact that constructs like Maybe[Maybe[X]]
or list[Maybe[X]]
are disallowed.
But I just realized that there's an even simpler solution available, if we can sacrifice the use of NULL in the C runtime representation:
StopIteration
and Default
.__next__
be the union A|StopIteration
, so that iterators return either normal values or simply StopIteration
.Default
whenever normal arguments are missing, so that the presence of defaults are marked by the union T|Default
in type signatures.Then seemingly weird constructs like T|Default|Default
would be harmless, as they are equivalent to normal forms with only a single occurrence of each type thanks to the union type semantics. Concretely this means that calling a function with Default
as an explicit argument would behave exactly the same as leaving it out, and iterating over a list of type list[str|StopIteration]
would continue until either the list is exhausted or a StopIteration
element is found. Not far from the behavior I originally thought we should have based on None
, but now using dedicated tokens that cannot interfere with None
instead.
Maybe such a semantics would be easier to explain? And maybe, maybe one could find a way of representing these new constants as NULL anyway, once ubiquitous methods like __str__
, __bool__
and the serialization mechanism are moved out to protocols...
Hmm, so I've run into this issue again (#1495 really) and... I'm in quite dire need of a fix.
I have to admit that I don't remember exactly what we agreed in all of this, but wasn't one thing that None
should turn into a global singleton similar to True
? I think that would fix my dict issue... right @nordlander ? Can we just go ahead and add a None
object?
I think we concluded not to use None
for holes in a dict, absent function parameters and end-of-iteration (while keeping None
for implicit function returns). Instead we'll define new global constants, that can be distinct in the three cases. And they can be called anything, except None
of course.
Representing some constant as the NULL pointer in C is also possible, provided we also change None
into an ordinary global on par with True
and False
. But we don't have to do that swap right away, simply introducing a new arbitrary constant to stand for absent values will fix the immediate dict problem described above. I plan to do the same for the defaultable function arguments. Finding the most efficient representation can be a topic to deal with later.
This is to summarize a proposal for resolving the multiple issues we've been discussing regarding the use of
None
to indicateIterator
sequencedict
This proposal takes the standpoint that we should use a different token for cases 2-4 but keep
None
for case 1. One major reason is that this coincides with established use ofNone
in Python, and avoids givingNone
any fundamentally new roles.A new token is thus needed, which I choose to call
null
. For efficiency reasons it would be good if we could represent this token as theNULL
pointer in C, whileNone
could be relegated to a standard representation as a global immutable object akin toTrue
andFalse
. Picking a standard representation forNone
is further motivated by our plan to provide more systematic support for union types, which includes viewing optional values simply as members of the union betweenNone
and some other type.Now, making
null
a member of every (boxed) type is a road to disaster we're not going to take, so we need a new type to denote the presence ofnull
. And this new type will have to be made visible to the programmer, as it will appear both in type signatures (indicating that a function argument has a default value and can be left out) and in user-definedIterator
subclasses (as the result type of method__next__
). Still, just duplicating the treatment ofNone
and relying on union types doesn't look very attractive when it comes tonull
, as it would rule out an efficient null pointer representation at run-time.The proposal is therefore to provide a built-in, abstract and parameterized type to indicate the presence of
null
, tentatively calledMaybe[T]
in this discussion. What name to choose long term is something I want us to carefully consider.There are just two constructors for this type:
null : Maybe[T]
really : (T) -> Maybe[T]
and two destructors:
exists : (Maybe[T]) -> bool
reveal : (Maybe[T]) -> T
supporting the following straightforward equivalences:
exists(null) == False
exists(really(x)) == True
reveal(really(x)) == x
reveal(null) == raise NullException
The intended use of
null
and theMaybe
type is perhaps most readily demonstrated by showing how to apply them to cases 2 and 3 above. So, assuming a function with default parameters:and assuming the inferred types for the parameters would be
int
,str
andfloat?
, respectively, the compiler would translate the function definition intoand a call like
fun(1, 'hello')
intofun(1, really('hello'), null)
. An important point to note is that sincenull
is represented asNULL
on the C level, there's no need to invent any additional wrapping for thereally
constructor -- every valuereally(x)
can be represented exactly asx
at run-time!Well, the last statement actually requires some modication. Consider what would happen if
fun
instead were defined asso that type of parameter
c
would itself be aMaybe
type. Then the translated function would becomeand the call
fun(1, 'hello', null)
would end up asfun(1, really('hello'), really(null))
. But ifreally(x)
is represented asx
, thenreally(null)
must of course be represented asnull
, thereby triggering insertion of the default valuereally(0.0)
forc
even though an explicit argument was given!Clearly some restriction must be devised that can prohibit dubious constructs like the nesting of multiple Maybe types, and (for similar reasons) the inclusion of
Maybe[T]
inside a union construct, or the instantiation of generic parameters with such a type. Luckily, such a restriction seems to coincide with a type restriction we'll have to implement anyway once we start supporting unboxed values.Refering to Pyton Jones and Launchbury (FPCA '91), a language based on parametric polymorphism cannot allow polymorphic functions to manipulate unboxed values, simply because unboxed values have highly non-uniform representations. Another way of phrasing this is that only standard boxed types can be allowed to instantiate generic type variables, something which is relatively easy to check and uphold inside the kind-checking parts of a type-inferencer.
The core idea of this proposal is now this: treat
Maybe[T]
as an unboxed type! This effectively stopsMaybe[T]
values from being passed as arguments to generic functions, or stored inside generic data structures. In particular, it stopsMaybe[T]
from instantiating the generic type parameter of theMaybe
type itself, thereby outruling the problematicMaybe
nesting. And by carefully defining kind correctnes for union types in the same way, we avoid mixingMaybe
values with terms that fundamentally require a uniform representation.So, assuming that we can define and implement the restriction required in order to support unboxed values, we can also rest assured that it's safe to expose the
Maybe
type to the arbitrary programmer. Still, programmers will have very few incentives for exploitingMaybe
in everyday code, asNone
and the union construct will provide a much higher level of convenience for expressing optionals. So my guess is that theMaybe
type will essentially just be used for the two cases listed as motivation above: as the indicator that a function argument is optional, and as the tokens returned by a user-definedIterator
(the need to handle non-existing values in adict
can probably be fully confined inside thedict
internals).I'm ending this exposition with a sketch of how iterators based on
Maybe
would be handled by the compiler. First the abstractIterator
base class:Then the gradual transformation of a
for
loop iterating over some typet1
:==>
==>
Finally an example of a user-defined
Iterator
subclass:Note especially that it's just in the iterator definition that the programmer becomes reminded of the
Maybe
type; actual iteration can proceed without knowledge of the type as long as the standardfor ... in
syntax is used.Issues to ponder: are the names used above appropriate? In particular, does a signature
signal the fact that parameters
b
andc
have default values an can be left out? Or does it look too complicated? Would we rather have some special notation here?