Alternative method for scoping static loop instance

sagebind commented 8 years ago

This is a new issue to aggregate some discussion that happened in #14 and #17 concerning how the global event loop instance is controlled.

Currently we are offering Loop::execute() to set the loop driver and Loop::get() to fetch the instance. Is this the best approach, or should we also provide a get()/set()-style control as well? We discussed this a bit previously, but I wasn't sure if we came to a conclusive stance.

Here's a summary of the arguments so far that I recall:

Provide just execute() and get(): Eliminates the possibility of people messing with loop instances in ways they shouldn't, provides clear scoping at the entry of the program.
Provide get(), set(), and drop(): Greater flexibility, nesting in a closure is not mandatory.
Provide all or combination of above methods: Kind of all above benefits, except you lose guarantee that people can't break things and you now have multiple ways of doing a similar thing.

I closed my old PR as the purpose had already been solved: adding a static class for storing the global driver instance. I opened a new PR (#27) with a reference implementation for discussion.

sagebind commented 8 years ago

I'll copy my general thoughts here as well from earlier:

The execute() closure-style isn't particularly bad, but the design seems optimized for the use-case of multiple event loops in one program, which I thought isn't something we were trying to encourage (not that it should necessarily be forbidden).
Using a closure makes scope clear in the file in which execute() is called, but I don't think it helps much anywhere else in a program.
There are technical issues with closures and multithreading; absolutely requiring a closure in order to set up the event loop initially may or may not be an issue. I don't have a solid answer to that. Similarly, how will this affect forking inside an event loop scope? There needs to be a way to simply discard all event loops from the parent process cleanly, which I'm not sure the strict execute() approach allows. Forking doesn't treat previous scoping nicely in that regard.

kelunik commented 8 years ago

The execute() closure-style isn't particularly bad, but the design seems optimized for the use-case of multiple event loops in one program, which I thought isn't something we were trying to encourage (not that it should necessarily be forbidden).

Why does it seem optimized for multiple event loops for you? The usual way to do things in Amp is Amp\run(function() { /* ... */ });.

Regarding forking, I think you would just start a new loop inside or better fork before you start the loop.

sagebind commented 8 years ago

Why does it seem optimized for multiple event loops for you? The usual way to do things in Amp is Amp\run(function() { /* ... */ });.

I don't have a technical answer to that. :grinning: Merely that, in typical usage, you'll have "setup" code outside of the closure before the run(), and everything else inside the closure. It makes the code look disproportionate.

Regarding forking, I think you would just start a new loop inside or better fork before you start the loop.

For forking that may work... I assume you mean something like this:

function runInFork(callable $work) {
    pcntl_fork();
    Loop::execute($work, ...some loop...);
    exit();
}

Loop::execute(function() {
    // some stuff
    runInFork(function() {});
}, $loop);

The issue is that the parent loop is left in memory in the forked process, and then is effectively killed while running at the exit() Not exactly a clean escape. The situation in threads is actually worse, as persisting the previous event loop instance beyond the entry-point method could cause pthreads to try and serialize it.

Requiring users to create all forks and threads before any asynchronous code seems like an artificial requirement that limits their usefulness.

I guess the main problem is that execute() guarantees a balanced stack of execution, but forks and threads need to discard the parent's stack history.

kelunik commented 8 years ago

I have to say: I didn't use forking in combination with event loops yet.

bwoebi commented 8 years ago

I have no idea about the forking either; never did that with an event loop. You theoretically could Loop::stop() and start a separate loop in the child process then.

bwoebi commented 8 years ago

Actually, we have this problem, when we want to support something like wait(). In that case you aren't inside the run() function [callback], but need to set events on the loop.

I though disagree that Loop::set() is the solution to that. It feels wrong to force the user to manually set the loop in a sync application… Don't know :-/

In Amp we go around that issue by creating a default loop if none is set yet...

sagebind commented 8 years ago

Perhaps to avoid Loop::set() we could only provide Loop::drop() in addition to Loop::execute()? Drop is less likely to be mis-used by accident, though we still have the issue of pthreads not playing all that nicely with closures.

I think right now I lean toward providing all of the methods above, since most of the time Loop::execute() provides the best interface for users, but Loop::set() and Loop::drop() are still available for those situations where you need them.

sagebind commented 8 years ago

@bwoebi Oh, I think I just understood what you meant :stuck_out_tongue:

Actually, we have this problem, when we want to support something like wait(). In that case you aren't inside the run() function [callback], but need to set events on the loop.

You mean like this:

// I'm so synchronous...
$request = new SomeAsyncLibThing();
// Block all the things...
$response = EventLoop\wait($request->send()); // <-- we need to run someone's loop here, but then discard it when we have the result...

bwoebi commented 8 years ago

Yes, that's the issue... You want to ideally have the user unaware of the event-loop under the hood…

bwoebi commented 8 years ago

@coderstephen The only solution I can think of would be the event loops implementations setting a default loop in an always loaded file (i.e. files entry with composer loader).

We'd need set() then, but we'd disallow it's usage inside Loop::execute().

It also possibly should be possible to allow (top-level) Loop::execute() to run without $loop parameter then (you just use the auto-set one).

That'd have the advantage of abstracting the actual loop away completely...

kelunik commented 8 years ago

@bwoebi: we could parse composer.json for the "extra" field, but that's not performant. So maybe Loop::setFactory(callable) and the event loop implementation will have a files file settings that.

bwoebi commented 8 years ago

@kelunik yeah, exactly that.

kelunik commented 8 years ago

I added a proposal as PR: #30

sagebind commented 8 years ago

@bwoebi If set() is provided, I agree it should not be allowed while running in an execute(). My PR already handles that case, I think.

Regardless of how the driver instance is actually obtained, I think we still need the extra degree of control with set() and drop() for those certain cases, e.g. multiprocessing.

A factory is really handy in this case though as well. The example runInFork() earlier could be improved to be:

function runInFork(callable $work) {
    pcntl_fork();
    Loop::drop();
    Loop::set(Loop::createDriver());
    $work();
    Loop::run();
    exit();
}

or even better:

function runInFork(callable $work) {
    pcntl_fork();
    Loop::drop();
    Loop::execute($work);
    exit();
}

I'm not sure the latter plays nicely with pthreads 2 because of the closure, but at least it is possible without persisting the parent event loop in memory.

kelunik commented 8 years ago

I think we don't need set(), drop() will be enough?

kelunik commented 8 years ago

@coderstephen Could you clarify: Why do you need to fork inside an event loop? What's the actual use case for it?

sagebind commented 8 years ago

@kelunik On further consideration, you may be right; drop() is probably enough.

Fork in an event loop for multiprocessing. The goal isn't to fork the event loop, but to run user code in parallel, especially blocking code. See Icicle concurrent and Icicle filesystem for use cases. The event loop being duplicated into a child process is an inconvenience, not a benefit, in these cases.

kelunik commented 8 years ago

I get that, but what's the benefit of forking vs. simply launching a worker process and maybe even reuse that? Because not only your loop will be duplicated, but every thing else, too.

sagebind commented 8 years ago

Speed. Inheriting functions being passed, not having to re-load classes from disk in the class loader. Memory copy is faster than disk access.

I'm not sure the benefits of forking and multithreading are relevant to this issue though. The fact that someone wants to do it should be enough that the standard should be compatible with this use case.

bwoebi commented 8 years ago

@coderstephen I'm not sure why exactly you want drop()? You might just run another loop inside the current loop:

function runInFork(callable $work) {
    if (pcntl_fork() == 0) {
        Loop::onError(null); /* eventual exceptions should not be caught */
        Loop::execute($work); /* it will just normally execute another loop inside the main loop */
        exit(); /* the main loop is never continued as we exit here. */
    }
}

Or am I missing anything here?

Sure, the old event loop will appear in the stacktrace, but that's already it.

kelunik commented 8 years ago

@bwoebi And the old loop will still be there in memory with all it's watchers and so on. Like every other object that has been created, too. That's why I questioned whether forking is a good idea here.

bwoebi commented 8 years ago

It will be in memory, but that's unavoidable, as it's still referenced in the stacktrace. There's no actual way to destroy the loop. But @coderstephen is probably right, it's still faster to copy a bit of memory (I assume it'll be < 50 MB (time to copy, uh, less than 10 ms (depends on RAM speed)). [But yeah, you shouldn't do that in an Aerys server for example as memory may spike and when you fork() processes of 500 MB multiple times per second you have a problem.]

For most cases -d opcache.enable_cli=1 is good enough. It should be only marginally slower than direct forking.

sagebind commented 8 years ago

Yeah, you definitely don't want to be forking for every request that comes in when writing a web server, but managing a fork pool comes in handy. :grinning:

It will be in memory, but that's unavoidable, as it's still referenced in the stacktrace. There's no actual way to destroy the loop.

@bwoebi I'm not trying to stir the pot or anything, but the only reason the loop is referenced in the stacktrace is because of Loop::execute(). A plain get()/set() system does not have this issue, though admittedly just that approach has other inconveniences.

bwoebi commented 8 years ago

@coderstephen we'll still have to start the loop in some way … whether it's now via a get/set/run or execute function. And this will then be in the stacktrace. No way around that.

Also, proc_open() is really not that bad, especially if you have opcache for fast startups.

kelunik commented 8 years ago

@coderstephen Can we close this now, as it can't work?

kelunik commented 8 years ago

Closing as it doesn't work as the loop will always be present in the backtrace.

async-interop / event-loop

Alternative method for scoping static loop instance #28