eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools
http://ceylon-lang.org
Apache License 2.0
399 stars 62 forks source link

Typeful continuations #4274

Open CeylonMigrationBot opened 9 years ago

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau] This is something many people must have been thinking about, so there may already be plans for this.

The idea is to introduce patterns like generators, queries, asynchronous workflows, reactive UI programming, parsers, etc. into the language.

What these patterns require is the capacity to build a virtual model of the execution of a function that lives on the heap, can pause by itself and be resumed at any later moment, maybe never, maybe multiple times. While this may represent a fair amount of effort in the compiler, at this level at least I don't see any big theoretical hurdle.

The challenge lies in shaping this into the language specification in a way that blends well with its existing paradigms, is intuitive to end users and practical for API writers.

So I came up with this idea of how this might be put in place. My biggest concern is whether this can be parsed at all.

How it would look like

The set of possible "pausing" operators within a function would be specified by an interface mentioned in the operates clause of the function declaration:

void rollupSalaries() operates Selector<String> {
    for (department in departments) {
        variable Float sum = 0.0;
        for (employee in employees) {
            if (employee.departmentId == department.id) {
                select "Employee ``employee.name``: ``employee.salary``";
                sum += employee.salary;
            }
        }
        select "Department ``department.name`` total: ``sum``";
    }
}
Iterable<String> lazyRollup = generateSelection(rollupSalaries);
void eagerRollup() {
    ArrayList<String> rollupList = ArrayList<String>();
    pushSelection(rollupSalaries)(rollupList.add);
}

[Employee, Department] joinEmployeeAndDepartment() operates IterableQueryer {
    value employee = pick employees;
    value department = pick departments;
    if (employee.departmentId == department.id) {
        return [employee, department];
    }
    else {
        throw;
    }
}
Iterable<[Employee, Department]> employeesAndDepartments = unroll(joinEmployeeAndDepartment);

void startSubmission() operates Awaiter {
    try {
        await submitForm();
    }
    catch (ConcurrencyException exception) {
        if (await askUserConfirmation("Override other user's changes?")) {
            await submitForm { ignoreConcurrency = true; };
        }
    }
}
Async<Anything> submission = asyncStart(startSubmission);

Integer integerParse() operates ParseBuilder {
    Boolean negative = (relay maybeCharacter('-'.equals)) exists;
    variable Integer integerValue = 0;
    while (exists digit = relay maybeCharacter(Character.digit)) {
        value digitValue = digit.integer-'0'.integer;
        integerValue *= 10;
        integerValue += digitValue;
    }
    return integerValue;
}
Character? maybeCharacter(Boolean characterPredicate(Character character)) operates ParseBuilder {
    return choose<Character, Null> [
        (() operates ParseBuilder => accept characterPredicate)(),
        (() operates ParseBuilder => null])();
}
Integer oneThousand = parse(integerParse)("1000");

Whole factorial(Whole n) operates Relayer {
    if (n.negative) {
        throw;
    }
    else if (n.zero) {
        return one;
    }
    else {
        value predecessorFactorial = relay factorial(n.predecessor);
        return predecessorFactorial*n;
    }
}
Whole millionFactorial = heapRun(factorial(wholeNumber(1_000_000)));
Whole millionFactorial2 = stackRun(factorial(wholeNumber(1_000_000))); // might crash

Grammaticaly, I thought of representing pausing operators as low precedence, right associative "keywords" on the model of return and throw. There might a better way. Maybe some punctuation is needed. The take-home idea is that even though these operators are very function-like in the way they are used, they are not functions: they cannot be assigned to values, passed as parameters, or invoked by closures. They only make sense in the context of a function, just like return and throw.

How it might work

The actual declaration of the operative interfaces would look like follow.

shared interface Consumer<in Return> {
    shared formal void doReturn(Return returned);
    shared formal void doThrow(Throwable thrown);
}

shared interface Relaying<out Other>
        of Other {
    shared default void relay<Return>(
        Consumer<Return> capture,
        void start(Consumer<Return> consumer, Other relaying)) {
        start(capture, this of Other);
    }
}

shared interface Relayer satisfies Relaying<Relayer> { }

shared interface Selector<in Selected> 
        satisfies Relaying<Selector<Selected>> {
    shared formal void select(Consumer<Anything> capture, Selected selected);
}

shared interface IterableQueryer
        satisfies Relaying<IterableQueryer> {
    shared formal void pick<Return>(Consumer<Return> capture, Iterable<Return> iterable);
}

shared interface Async<out Return> {
    shared formal void setContinuation(Consumer<Return> continuation); // thread-safe
}

shared interface Awaiter satisfies Relaying<Awaiter> {
    shared formal void await<Return>(Consumer<Return> capture, Async<Return> async);
}

shared interface ParseBuilder
    satisfies Relaying<ParseBuilder> {

    shared formal void accept(Consumer<Character> capture, Boolean characterPredicate(Character character));

    shared formal void choose<FirstOption, SecondOption>(
        Consumer<FirstOption|SecondOption> capture,
        [Anything(Consumer<FirstOption>, ParseBuilder),
            Anything(Consumer<SecondOption>, ParseBuilder)] choice);
}

The basic transformation is that any operator with apparent signature Return foo(Arguments...) has actual signature void foo(Consumer<Return>, Arguments...). These are the handles called by the virtualized function when it pauses.

Likewise, for a pausing function Return foo(Arguments...) operates Operative the real-world signature is void foo(Arguments…)(Consumer<Return> consumer, Operative operative), or maybe void foo(Arguments…)(Consumer<Return>&Operative handler). It is a two-step function that creates, then starts the virtual function's execution and pauses at the first call to a pausing operator, or return or throw.

The rest is just object-oriented programming.

Return heapRun<Return>(void start(Consumer<Return> consumer, Relayer relayer)) {

    value returnHolder = ReturnHolder<Return>();

    Stack<Anything()> todos = ArrayList<Anything()>();

    object heapRelayer satisfies Relayer {
        shared actual void relay<Return>(Consumer<Return> capture, void start(Consumer<Return> consumer, Relayer relayer)) {
            value returnHolder = ReturnHolder<Return>();
            todos.push(() => returnHolder.passReturn(capture));
            todos.push(() => start(returnHolder, this));
        }
    }

    start(returnHolder, heapRelayer);

    while (exists todo = todos.pop()) {
        todo();
    }

    return returnHolder.heldReturn;
}

(Other "executor" functions left as exercises)

Pushing the idea

What do you think?

[Migrated from ceylon/ceylon-spec#1168]

CeylonMigrationBot commented 9 years ago

[@RossTate] Could you choose one example, show what the programmer would write, and then show what it would translate into? That would answer a ton of my questions.

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau]

Could you choose one example, show what the programmer would write, and then show what it would translate into? That would answer a ton of my questions.

Let's take the factorial:

Whole factorial(Whole n) operates Relayer {
    if (n.negative) {
        throw;
    }
    else if (n.zero) {
        return one;
    }
    else {
        value predecessorFactorial = relay factorial(n.predecessor);
        return predecessorFactorial*n;
    }
}

… which would translate into something like this:

void factorial(Whole n)(Consumer<Whole> consumer, Relayer relayer) {
    try {
        if (n.negative) {
            consumer.doThrow(Exception());
        }
        else if (n.zero) {
            consumer.doReturn(one);
        }
        else {
            object relay1Continuation satisfies Consumer<Whole> {
                shared actual void doReturn(Whole returned) {
                    try {
                        value predecessorFactorial = returned;
                        consumer.doReturn(predecessorFactorial*n);
                    }
                    catch (Throwable thrown) {
                        consumer.doThrow(thrown);
                    }
                }
                shared actual void doThrow(Throwable thrown) {
                    consumer.doThrow(thrown);
                }
            }
            relayer.relay(relay1Continuation, factorial(n.predecessor));
        }
    }
    catch (Throwable thrown) {
        consumer.doThrow(thrown);
    }
}

Of course, the transformation would be a bit messier if a pausing operator were to be found in the middle of a for loop, or a try; then the control structure would have to be cut into pieces, and enclosing structures as well. But that is the whole interest of having it done by the compiler: very often it is easier to think of your solution as simple "functions" calling other "functions", instead of as a noodle bowl of interrelated objects.

CeylonMigrationBot commented 9 years ago

[@RossTate] Ah, that helps a lot. Some thoughts:

  1. You need to specify how you call one of these functions.
  2. You need to think about how these would nest. For example, I would expect a call to factorial inside factorial would propagate the Relayer in the context and pass the remainder as the continuation rather than produce a first-class function.
  3. The design of Relaying seems broken; there doesn't seem to be a need for start to take an Other since Other is always just this. Your factorial example can be redone without Other.
  4. Maybe you have some example that needs the Other flexibility, but it's hard to assess without any examples of implementing classes.
  5. It's not clear to me that you have the correct semantics for exceptions. I could see them going either way. Again, examples would help.

While I believe pinning down details may be a big challenge here, it does seem like it's possible to make everything work out.

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau] Some answers.

  • You need to specify how you call one of these functions.

The visible signature of an instrumented function is simply the signature of its transformed version. If you know what to do with it, you can use it as such, but normally all operative interfaces should come with an API of "executor" functions and classes. So if you want to execute the factorial you would simply do

Whole millionFactorial = heapRun(factorial(wholeNumber(1_000_000)));

(I outlined the implementation of heapRun towards the end of my first post.)

Very often there might be only one or two such relevant executing device; the difference between each will most often be a matter of algorithmics rather than semantics. For example, you may actually want to use the system stack to execute a recursive function; you may opt for either a lazy or an eager unrolling of a Selector<X>-operating function; you may use a parsing function to eagerly parse a string, or you may want to asynchronously parse an incoming stream (or a user typing), and you may want to follow alternative partial parses on separate threads.

  • You need to think about how these would nest. For example, I would expect a call to factorial inside factorial would propagate the Relayer in the context and pass the remainder as the continuation rather than produce a first-class function.

That's what the relay "operator" is designed for; it is used to make a "call" to another function that is instrumented with the same set of "operators"; of course all those functions are really higher-order, but when using them from an instrumented context you can think of them as first-order, provided you invoke them with relay.

  • The design of Relaying seems broken; there doesn't seem to be a need for start to take an Other since Other is always just this. Your factorial example can be redone without Other.

The answer to this proceeds from your preceding remark: because it seems very often natural to be able to transfer the instrumentation context through function "invocations", I wanted to have a similar-looking relay functions in all operative interfaces I gave as examples. The only way I found to do this with a common super-interface was by using the of Other trick, otherwise I would't pass the whole sub-interface but only Relayer. I'm not yet sure it's the best way, but it should work.

Here is an example of a "relayed" selection:

void rollupSalaries() operates Selector<String> {
    for (department in departments) {
        value sum = relay selectDepartmentEmployees(department.id);
        select "Department ``department.name`` total: ``sum``";
    }
}

Integer selectDepartmentEmployees(Integer departmentId) operates Selector<String> {
    variable Float sum = 0.0;
    for (employee in employees) {
        if (employee.departmentId == departmentId) {
            select "Employee ``employee.name``: ``employee.salary``";
            sum += employee.salary
        }
    }
    return sum;
}
  • It's not clear to me that you have the correct semantics for exceptions. I could see them going either way. Again, examples would help.

I'm not sure which "either way" you are seeing here. I guess I'll have to come up with more examples next year (soon coming).

Stay tuned!

CeylonMigrationBot commented 9 years ago

[@FroMage] Are we talking about continuations as in Scheme call/cc (that can escape and be passed to other threads), or just about the lesser form called coroutines?

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau]

Are we talking about continuations as in Scheme call/cc (that can escape and be passed to other threads), or just about the lesser form called coroutines?

What I am proposing somewhat fits the definition of continuations as an abstract representation of the control state of a computer program, with the caveat that we limit this representation to the function instrumented by an operates clause, and anything this function captures as a closure. I am not well acquainted with Scheme, but from what I understand this would be a major difference: the transformation does not apply to the whole program, and there is no need for any special mechanism in the target virtual machine.

This does not prevent us from passing continuations, as instances of Consumer<X>, to other threads.

I am not sure to what extent the concept of coroutine covers this proposal and whether that term would be more appropriate than "continuation", but I see a major difference from the basic example of coroutines: I do not limit myself to a single yield keyword, but rather allow any random set of yielding "keywords" to be used; the specific set of keywords that an instrumented function can use is given by the interface specified after operates – hence the "typeful". There is also no requirement that instrumented routines should be defined and executed in pairs.

CeylonMigrationBot commented 9 years ago

[@RossTate] Alright, I'm starting to get a better understanding of what's in your head.

Regarding the brokenness of Relaying and the need to transfer instrumentation context across invocations, here's how you do that for your factorial example:

shared interface Relaying {
    shared default void relay<Return>(
        Consumer<Return> capture,
        void start(Consumer<Return> consumer)) {
        start(capture);
    }
}

shared interface Relayer satisfies Relaying { }

Whole factorial(Whole n) operates Relayer {
    if (n.negative) {
        throw;
    }
    else if (n.zero) {
        return one;
    }
    else {
        value predecessorFactorial = relay factorial(n.predecessor);
        return predecessorFactorial*n;
    }
}
//translates to
void factorial(Whole n)(Relayer relayer)(Consumer<Whole> consumer) {
    try {
        if (n.negative) {
            consumer.doThrow(Exception());
        }
        else if (n.zero) {
            consumer.doReturn(one);
        }
        else {
            object relay1Continuation satisfies Consumer<Whole> {
                shared actual void doReturn(Whole returned) {
                    try {
                        value predecessorFactorial = returned;
                        consumer.doReturn(predecessorFactorial*n);
                    }
                    catch (Throwable thrown) {
                        consumer.doThrow(thrown);
                    }
                }
                shared actual void doThrow(Throwable thrown) {
                    consumer.doThrow(thrown);
                }
            }
            relayer.relay(relay1Continuation, factorial(n.predecessor)(relayer));
        }
    }
    catch (Throwable thrown) {
        consumer.doThrow(thrown);
    }
}

You only need to have an Other argument if you plan on the implementor of Relay to change the implementor across invocations. Note that relay would work differently from other operators because it's very different from the other operators. The question is how to make this difference more explicit.

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau]

You only need to have an Other argument if you plan on the implementor of Relay to change the implementor across invocations.

You are right. This simplifies the design of operator interfaces, and there is probably not much value in doing it the way I did except for saving a level of currying in the transformed function.

I think what I was trying to do was to let open the possibility of having a single object handle the responsabilities of both Consumer<Whole> and Relayer. This in the view of having a more general framework to define not only pausing expressions, but also control structures. This is not something into which I have put as much thinking as for the examples I gave, and I don't have a clear vision of how to do it, or even if it would be as useful and usable; but in the case it is, whatever is designed in a first iteration should allow for these extended functionalities to be developed at a later moment.

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau]

Note that relay would work differently from other operators because it's very different from the other operators. The question is how to make this difference more explicit.

OK, so I just realized that your solution implies some sort of specific syntaxic support for relay, because the transformation would be a little different compared to other operators. If we wish to limit the amount of changes to the base language, I guess this should also count as a disadvantage in comparison to my original solution.

Maybe we could also circumvent the problem by defining more complex rules of transformation, perhaps based on annotations attached to the operator interface's members.

CeylonMigrationBot commented 9 years ago

[@GabrielLetourneau] Here are a few thoughts regarding the handling of exceptions.

Let's suppose we have the following generator function:

void selectEmployeeNames() operates Selector<String> {
    try (rows = sql.Select("select name from employee").Results()) {
        for (row in rows) {
            value name = row["name"];
            assert(is String name);
            select name;
        }
    }
}

Iterable<String> names = generate(selectEmployeeNames);

It seems obvious that we need a way to engineer the generate function in such a way that a even a partial unrolling of names should destroy rows. Then we could write something like this:

value iterator = generate(selectEmployeeNames).iterator();
try {
    while (is String name = iterator.next(),
        name.startsWith("a")) {
        print(name);
    }
}
finally {
    iterator.destroy(null);
}

Side note: I think that in general Iterator<Element> should satisfy Destroyable, and all for loops should implicitly destroy their iterator; but that is an another issue alltogether.

This is a case where having a doThrow function in my Consumer<X> interface comes in handy: this allows my execution setup to use an exception to signal termination.

shared interface ResourceIterable<Element> satisfies Iterable<Element> {
    shared actual formal Iterator<Element>&Destroyable iterator();
}

ResourceIterable<Selected> generate<Selected>(
    void start(Consumer<Anything> consumer, Selector<Selected> selector)) {

    interface IteratorState {
        shared default Selected|Finished lastSelected => finished;
        shared default void move() { }
        shared default Boolean abort(Throwable thrown) { return false; }
    }

    object iterable satisfies ResourceIterable<Selected> {
        shared actual Iterator<Selected>&Destroyable iterator() {
            object iterator satisfies Iterator<Selected>&Destroyable {

                variable IteratorState currentState;

                object consumer satisfies Consumer<Anything> {
                    shared actual void doReturn(Anything returned) {
                        object returnedState satisfies IteratorState { }
                        currentState = returnedState;
                    }
                    shared actual void doThrow(Throwable thrown) {
                        object thrownState satisfies IteratorState {
                            shared actual Selected|Finished lastSelected {
                                 throw thrown;
                            }
                        }
                        currentState = thrownState;
                    }
                }

                object selector satisfies Selector<Selected> {
                    shared actual void select(Consumer<Anything> capture, Selected selected) {
                        object selectedState satisfies IteratorState {
                            shared actual Selected lastSelected => selected;
                            shared actual void move() => capture.doReturn(null);
                            shared actual Boolean abort(Throwable thrown) {
                                capture.doThrow(thrown);
                                return true;
                            }
                        }
                        currentState = selectedState;
                    }
                }

                object unstartedState satisfies IteratorState {
                    shared actual void move() => start(consumer, selector);
                }
                currentState = unstartedState;

                shared actual Selected|Finished next() {
                    currentState.move();
                    return currentState.lastSelected;
                }

                shared actual void destroy(Throwable? error) {
                    while (currentState.abort(error else Exception())) { }
                }
            }

            return iterator;
        }
    }

    return iterable;
}

Of course it might be a bit too costly to instantiate an exception for what looks like a pretty normal flow of execution. But in any case, there must be situations where it makes sense for a yielding operator to throw an exception, so I need doThrow anyway. The simplest way to add the possibility of terminating an instrumented function without Throwable instantiation is probably by extending my Consumer<Return> interface.

shared interface Consumer<in Return> {
    shared formal void doReturn(Return returned);
    shared formal void doThrow(Throwable thrown);
    shared formal void abort();
}

… or else

shared interface Consumer<in Return> {
    shared formal void doReturn(Return returned);
    shared formal void doThrow(Throwable? thrown);
}
CeylonMigrationBot commented 9 years ago

[@RossTate] Looks cool. I'd like to hear what other people on the project think.

CeylonMigrationBot commented 8 years ago

[@luolong] As an answer to @RossTate's question:

I'd like to hear what other people on the project think.

I can not speak of the proposed implementation, but the original goal outlined in OP sounds like something very useful indeed...

What I do not understand just by looking at this exchange is who is to be responsible for implementing this stuff?

As a language user I want all the gritty details of capturing the state of continuations shoved somewhere out of my sight so that I could just lay comfortably back and enjoy the syntactic sugar.

As a library designer I might be convinced to look under the rug and try my hand at carving out my own customized implementation, if it is what it takes ...

CeylonMigrationBot commented 8 years ago

[@t-nelis] I highly recommend at least a light reading of the "General Theory of Reactivity" (recordings of talks are also available, I don't know what they cover yet though).

Too many languages and libraries include only incomplete support for expressing the temporal/concurrent aspects of programs. Others only offer primitives that are evidently too low-level, forcing many to build often incompatible abstractions for very common patterns on top of them.

This proposal is interesting because it clearly attempts to hit the right level of abstraction immediately. The programming community settled on many useful patterns that would greatly benefit from first-class status in a language, as some success stories indicate ("is settling on" might be more accurate, but a big picture is definitely emerging). I think the GTOR is one of the best attempts at classifying those patterns yet, and can be very helpful to Ceylon by providing a framework for reasoning around these patterns as well as increasing the chances that Ceylon doesn't become one of those languages that "almost got it right, but not quite".