actonlang / acton

The Acton Programming Language
https://www.acton-lang.org/
BSD 3-Clause "New" or "Revised" License
79 stars 7 forks source link

None vs "nothingness" #1572

Open nordlander opened 10 months ago

nordlander commented 10 months ago

This is to summarize a proposal for resolving the multiple issues we've been discussing regarding the use of None to indicate

  1. The implicit return of "nothing" from a function
  2. The absence of a function argument (implying the use of a default parameter value)
  3. The end of an Iterator sequence
  4. A non-existing value in a dict

This proposal takes the standpoint that we should use a different token for cases 2-4 but keep None for case 1. One major reason is that this coincides with established use of None in Python, and avoids giving None any fundamentally new roles.

A new token is thus needed, which I choose to call null. For efficiency reasons it would be good if we could represent this token as the NULL pointer in C, while None could be relegated to a standard representation as a global immutable object akin to True and False. Picking a standard representation for None is further motivated by our plan to provide more systematic support for union types, which includes viewing optional values simply as members of the union between None and some other type.

Now, making null a member of every (boxed) type is a road to disaster we're not going to take, so we need a new type to denote the presence of null. And this new type will have to be made visible to the programmer, as it will appear both in type signatures (indicating that a function argument has a default value and can be left out) and in user-defined Iterator subclasses (as the result type of method __next__). Still, just duplicating the treatment of None and relying on union types doesn't look very attractive when it comes to null, as it would rule out an efficient null pointer representation at run-time.

The proposal is therefore to provide a built-in, abstract and parameterized type to indicate the presence of null, tentatively called Maybe[T] in this discussion. What name to choose long term is something I want us to carefully consider.

There are just two constructors for this type:

and two destructors:

supporting the following straightforward equivalences:

The intended use of null and the Maybe type is perhaps most readily demonstrated by showing how to apply them to cases 2 and 3 above. So, assuming a function with default parameters:

    def fun(a, b = some_expr, c = None):
        ...

and assuming the inferred types for the parameters would be int, str and float?, respectively, the compiler would translate the function definition into

    def fun(a: int, b: Maybe[str], c: Maybe[float?]):
        b: str = reveal(b) if exists(b) else some_expr
        c: float? = reveal(c) if exists(c) else None
        ...

and a call like fun(1, 'hello') into fun(1, really('hello'), null). An important point to note is that since null is represented as NULL on the C level, there's no need to invent any additional wrapping for the really constructor -- every value really(x) can be represented exactly as x at run-time!

Well, the last statement actually requires some modication. Consider what would happen if fun instead were defined as

    def fun(a, b = some_expr, c = really(0.0)):
        ...

so that type of parameter c would itself be a Maybe type. Then the translated function would become

    def fun(a: int, b: Maybe[str], c: Maybe[Maybe[float]]):
        b: str = reveal(b) if exists(b) else some_expr
        c: Maybe[float] = reveal(c) if exists(c) else really(0.0)
        ...

and the call fun(1, 'hello', null) would end up as fun(1, really('hello'), really(null)). But if really(x) is represented as x, then really(null) must of course be represented as null, thereby triggering insertion of the default value really(0.0) for c even though an explicit argument was given!

Clearly some restriction must be devised that can prohibit dubious constructs like the nesting of multiple Maybe types, and (for similar reasons) the inclusion of Maybe[T] inside a union construct, or the instantiation of generic parameters with such a type. Luckily, such a restriction seems to coincide with a type restriction we'll have to implement anyway once we start supporting unboxed values.

Refering to Pyton Jones and Launchbury (FPCA '91), a language based on parametric polymorphism cannot allow polymorphic functions to manipulate unboxed values, simply because unboxed values have highly non-uniform representations. Another way of phrasing this is that only standard boxed types can be allowed to instantiate generic type variables, something which is relatively easy to check and uphold inside the kind-checking parts of a type-inferencer.

The core idea of this proposal is now this: treat Maybe[T] as an unboxed type! This effectively stops Maybe[T] values from being passed as arguments to generic functions, or stored inside generic data structures. In particular, it stops Maybe[T] from instantiating the generic type parameter of the Maybe type itself, thereby outruling the problematic Maybe nesting. And by carefully defining kind correctnes for union types in the same way, we avoid mixing Maybe values with terms that fundamentally require a uniform representation.

So, assuming that we can define and implement the restriction required in order to support unboxed values, we can also rest assured that it's safe to expose the Maybe type to the arbitrary programmer. Still, programmers will have very few incentives for exploiting Maybe in everyday code, as None and the union construct will provide a much higher level of convenience for expressing optionals. So my guess is that the Maybe type will essentially just be used for the two cases listed as motivation above: as the indicator that a function argument is optional, and as the tokens returned by a user-defined Iterator (the need to handle non-existing values in a dict can probably be fully confined inside the dict internals).

I'm ending this exposition with a sketch of how iterators based on Maybe would be handled by the compiler. First the abstract Iterator base class:

    class Iterator[A] (object):
       __next__ : () -> Maybe[A]

Then the gradual transformation of a for loop iterating over some type t1:

    for p in e:                 # where e : t1
        ...

==>

    for p in w.__iter__(e):     # where w : t1(Iterable[t2])
        ...

==>

    _i: Iterator[t2] = w.__iter__(e)
    while True:
        _v: Maybe[t2] = _i.next()
        if exists(_v):
            p = reveal(_v)
            ...
        else:
            break

Finally an example of a user-defined Iterator subclass:

    class MyIterator (Iterator[int]):
        def __init__(self, start, max)
            self.start = start
            self.max = max
            self.state = start
        def __next__(self):
            if self.state <= self.max:
                r = self.state
                self.state = self.state * 2
                return really(r)
            else:
                return null

Note especially that it's just in the iterator definition that the programmer becomes reminded of the Maybe type; actual iteration can proceed without knowledge of the type as long as the standard for ... in syntax is used.

Issues to ponder: are the names used above appropriate? In particular, does a signature

    fun : (a: int, b: Maybe[str], c: Maybe[float?]) -> t

signal the fact that parameters b and c have default values an can be left out? Or does it look too complicated? Would we rather have some special notation here?

sydow commented 10 months ago

Great proposal, clearly described. In particular, thinking of Maybe as an unboxed type is very helpful.

I haven't thought about the challenges in implementing this, but if Johan has ideas (with the help of Launchbury/Peyton Jones), I am all for.

Only one minor comment: as mentioned at the end, the most common situation when programmer will meet the Maybe constructor is in signatures as

    fun : (a: int, b: Maybe[str], c: Maybe[float?]) -> t

and here it seems that Optional would be a more natural name. We have used "Optional" when talking about the union with None, but maybe we could adapt to a change?

nordlander commented 10 months ago

Optional is definitely better than Maybe. Perhaps Opt could also be considered, a short name might have benefits.

We can probably learn to distinguish this "unboxed" optional type from the regular union with None. But it's interesting to note that once we have unboxed values in the language, we'll have to cope with a similar distinction between 'int' and the unboxed integer type, between 'float' and unboxed floats, etc. The only difference being that the unboxed ints and floats can mostly be kept away from normal programmers, whereas the "unboxed optional" will be visible in the signatures of every function that supports default arguments.

Perhaps that's a case for inventing some special syntax for the unboxed optional? Like the ?t notation we already have for the (planned) union with None...

sydow commented 10 months ago

Perhaps that's a case for inventing some special syntax for the unboxed optional? Like the ?t notation we already have for the (planned) union with None…

Or use the ?t notation for the unboxed optional and use the (forthcoming) general union notation for union with None without any shorthand for that case.

Björn

plajjan commented 10 months ago

I'd really prefer if we can explain all of this to someone that doesn't know what boxed values are. I don't think that really changes the proposal in itself but somehow I feel it's an important distinction to make when we are talking about the softer properties of this proposal.

For example, the NULL side of this (the actual unboxed NULL) will be used as the sentinel value to actually stop iteration. However, I think it is easier to explain this to a novice user that an iterator should mark the stop of the iteration by doing return StopIteration where StopIteration is just a friendly named object which is really just NULL. I think that's easier to understand than trying to differentiate between two "nothing" values (None and NULL). While types might be something most people can grasp, I think explaining boxed values is a level deeper and I think it would be really good if we could explain how to use the language without the users having to take a CS course first.

Do you see what I mean? It's mostly in how we refer to this and explain it, not necessarily of how it works.

My mental model of this is more like:

I reckon we want to use the shorthand ?t for the more common None that the application developers use. Consider the four cases:

  1. The implicit return of "nothing" from a function
  2. The absence of a function argument (implying the use of a default parameter value)
  3. The end of an Iterator sequence
  4. A non-existing value in a dict

And our current view:

  1. We stick to implicit return None
  2. new Maybe NULL
  3. NULL or call it something nice like StopIteration
  4. NULL but this should not be visible to the user I think!?

So there are really few cases when we would actually need to write Maybe, so it's better to use it for None.

nordlander commented 10 months ago

I agree. And there's actually no need to talk about unboxed types when discussing Maybe/Optional in the documentation, just the fact that constructs like Maybe[Maybe[X]] or list[Maybe[X]] are disallowed.

But I just realized that there's an even simpler solution available, if we can sacrifice the use of NULL in the C runtime representation:

Then seemingly weird constructs like T|Default|Default would be harmless, as they are equivalent to normal forms with only a single occurrence of each type thanks to the union type semantics. Concretely this means that calling a function with Default as an explicit argument would behave exactly the same as leaving it out, and iterating over a list of type list[str|StopIteration] would continue until either the list is exhausted or a StopIteration element is found. Not far from the behavior I originally thought we should have based on None, but now using dedicated tokens that cannot interfere with None instead.

Maybe such a semantics would be easier to explain? And maybe, maybe one could find a way of representing these new constants as NULL anyway, once ubiquitous methods like __str__, __bool__ and the serialization mechanism are moved out to protocols...

plajjan commented 8 months ago

Hmm, so I've run into this issue again (#1495 really) and... I'm in quite dire need of a fix.

I have to admit that I don't remember exactly what we agreed in all of this, but wasn't one thing that None should turn into a global singleton similar to True? I think that would fix my dict issue... right @nordlander ? Can we just go ahead and add a None object?

nordlander commented 8 months ago

I think we concluded not to use None for holes in a dict, absent function parameters and end-of-iteration (while keeping None for implicit function returns). Instead we'll define new global constants, that can be distinct in the three cases. And they can be called anything, except None of course.

Representing some constant as the NULL pointer in C is also possible, provided we also change None into an ordinary global on par with True and False. But we don't have to do that swap right away, simply introducing a new arbitrary constant to stand for absent values will fix the immediate dict problem described above. I plan to do the same for the defaultable function arguments. Finding the most efficient representation can be a topic to deal with later.