Worker termination Vs Flush for worker interactions

vinok88 commented 5 years ago

Description: The spec says "If a worker W is about to terminate normally and there are messages still to be sent in a queue (which must be the result of executing an async-send-action), then the worker waits until the messages have been received or some receiving worker terminates."

This behavior restricts the implementation of fire and forget like scenarios since the sender always getting blocked until the sent async messages were sent. IMO, the worker should terminate, disregarding the async worker messages are sent or not.

In fact, the flush action is there for this same behavior. Spec says, "If peer-worker is specified, then flush waits until the queue of messages to be received by peer-worker is empty or until the peer-worker terminates."

I think this worker termination behavior restricts having a pure worker async send. Shall we remove this part, since the same can be achieved through flush action?

version: 2019R3

Suggested Labels:

Code sample that shows issue:

Related Issues:

jclark commented 5 years ago

Let me explain why the spec says this.

It is a general design principle of Ballerina that errors are not silently ignored: the user must explicitly indicate that a an error is to be ignored.
Async send is the normal way to send. Async means only "asynchronous". It doesn't imply unreliable or best efforts.

However, I agree that it is problematic for the case where the sending worker wants to terminate without waiting for the receiving worker to terminate.

MaryamZi commented 5 years ago

I believe this is also related to a similar concern raised regarding the following in the spec.

"If the receive-action corresponding to an async-send-action has a non-empty failure type, then it is a compile-time error unless it can be determined that a sync-send-action or a flush-action will be executed before the sending worker terminates with success."

jclark commented 5 years ago

Yes, that’s related.

Nadeeshan96 commented 2 years ago

May I ask what is the conclusion of this discussion?

If I give some code samples, in the below example even though there is a panic at the receiver, the sender terminates successfully, without abruptly panicking, due to the fire and forget nature in the current implementation.

import ballerina/io;
import ballerina/lang.runtime;

int x = 5;
public function main() {
    worker w1  returns string {
        5 -> w2;
        io:println("w1 almost done");
        return "w1 done"; 
    }

    worker w2 {
        runtime:sleep(2);
        x-=1;
        if(x>1) {
            panic error("error at w2");
        }
        io:println("w2 receiving");
        int x = <- w1;
        io:println("w2 received " + x.toBalString());
    }

    string p = wait w1;
    io:println("wait: ", p);
}

gives

Running executable

w1 almost done
wait: done

So it seems, it is better to do as the spec says so that the panics in the receiver are reflected in the sender. Now if the user want to trap the panic, it cannot be done same way as sync send which is shown below.

import ballerina/io;
import ballerina/lang.runtime;

int x = 5;
public function main() {
    worker w1  returns string {
        error? unionResult = trap 5 ->> w2;
        io:println("w1 almost done");
        return "w1 done"; 
    }

    worker w2 {
        runtime:sleep(2);
        x-=1;
        if(x>1) {
            panic error("error at w2");
        }
        io:println("w2 receiving");
        int x = <- w1;
        io:println("w2 received " + x.toBalString());
    }

    string p = wait w1;
    io:println("wait: ", p);
}

gives

Running executable

w1 almost done
wait: w1 done

Seems the panic can be trapped only if we wait for the sender like the below, because we can't use trap with async send or just at the end of the worker. It is not seen currently because we fire and forget.

import ballerina/io;
import ballerina/lang.runtime;

int x = 5;
public function main() {
    worker w1  returns string {
        5 -> w2;
        io:println("w1 almost done");
        return "w1 done"; 
    }

    worker w2 {
        runtime:sleep(2);
        x-=1;
        if(x>1) {
            panic error("error at w2");
        }
        io:println("w2 receiving");
        int x = <- w1;
        io:println("w2 received " + x.toBalString());
    }

    string|error p = trap wait w1;
    io:println("wait: ", p);
}

gives

Running executable

w1 almost done
wait: w1 done

So should we change the implementation to what is said in the spec or is it okay for the sending worker to terminate without waiting for the receiving worker to terminate or to get the messages?

jclark commented 2 years ago

My conclusion is:

I don't agree with the change suggested in the initial comment of this thread for the reasons I gave
I do think there ought to be a way to do fire-and-forget

Please feel free to suggest mechanisms to accomplish 2.

jclark commented 2 years ago

Here's an idea to solve this (without new syntax):

add a function to lang.runtime that terminates the current worker without waiting for queued messages to be flushed
modify the restriction in https://github.com/ballerina-platform/ballerina-spec/issues/358#issuecomment-549210861 mentioned by @MaryamZi not to apply when the return from the worker (explicit or implicit) is not reachable because of a call to a function with a return type of never

ballerina-platform / ballerina-spec

Worker termination Vs Flush for worker interactions #358