::execute(), ::setFactory() and ::set()

AndrewCarterUK commented 8 years ago

Apologies for my recent absence - I've had an obscene amount of work over the last few weeks and I'm not even close to the end. Regardless, I've had some concerns that can't be delayed any longer as (happily) we're progressing on with this.

My main issue is a personal U-turn in opinion regarding using ::execute() as a way to give scope to the static proxy methods. The code below is an attempt to illustrate the problem - which is that no service can safely and easily perform watcher operations. This is because there is no guarantee that the currently active loop driver is the one that issued the watcher. In the example below, the two ping calls could easily occur in different loops and cause a problem.

class Client
{
    private $stream;
    private $watcher = null;

    public function __construct($stream)
    {
        $this->stream = $stream;
    }

    public function ping()
    {
        if ($this->watcher !== null) {
            // This line could screw up
            Loop::cancel($this->watcher);
        }

        $this->watcher = Loop::onWritable($stream, function () use ($stream) {
            fwrite($stream, 'ping');
        });
    }
}

Loop::execute(function () {
    $client = new Client(...);

    $httpClient->get('/some-resource')->success(function () use ($client) {
        $client->ping();
    });

    // some stuff

    $client->ping();
});

The Client could store the watchers in the loop registry, but this is going to make the code very ugly, very quickly. Given that the purpose of this design is to avoid dependency injection with the hope of simplifying, I'd argue that we should re-evaluate if the current approach actually helps to remove complexity at all.

Although I could be persuaded, currently I think I would prefer ditching the registry and using ::set(), ::get(), ::clear() (or whatever) and making it so that consumers must manage the state of the loop themselves.

This leads on to my second issue, which is that we sort of accidently already have ::set() because calling ::setFactory() then any of the proxy methods will automatically create a loop and activate it - despite not being within the bounds of an :execute() call. I'm not really sure I have a strong opinion on this - it just feels like there's no clear way to use the API that we currently have.

bwoebi commented 8 years ago

I don't think I understand your problem. I honestly think you're seeing a highly theoretical problem where there's no practical one.

Loop::execute() is really just like a manual $old = get(), set($new), $callback(), set($old) sequence.

It's basically what you need to do anyway there. You have exactly the same problem when managing set/get yourself, just that it's probably more error-prone.

You never should move objects containing watchers into a separate loop scope than they were defined in. get/set/clear won't make any difference here. I've never seen this actually being a problem.

In fact, it helps a lot. I've had a lot of problems with the manual setting in past, especially when writing tests. Like, tests don't clean up perfectly, then I need to swap the loops; but it didn't happen on this test (e.g. forgetting to do it…) and debugging that down is painful. ::execute() nicely solves this by just installing a new loop every time.

setFactory() can only be called at the top-level anyway … set() is quite specific and could be called anytime. … execute actually makes things safer by preventing accidental sets at random places. Also, it doesn't activate the loop; there's just the hack with Loop::execute($cb, Loop::get()), which is used to emulate synchronous code ("wait"). In normal code you're all the time inside at least one level of Loop::execute() and thus that's a non-issue.

kelunik commented 8 years ago

I've never seen this actually being a problem.

That's not true. We've had that exact issue with amphp/dns. But it's an issue we can't really prevent.

AndrewCarterUK commented 8 years ago

You never should move objects containing watchers into a separate loop scope than they were defined in. get/set/clear won't make any difference here. I've never seen this actually being a problem.

I don't think that's something you can control. If you're creating a service you can't control where in the call stack your code is called from. Given the layers of callbacks that occur in this type of programming, I can see this being a frequent problem.

Also, it doesn't activate the loop; there's just the hack with Loop::execute($cb, Loop::get()), which is used to emulate synchronous code ("wait"). In normal code you're all the time inside at least one level of Loop::execute() and thus that's a non-issue.

I don't think I disagree with you, and I get that it can only be done at the top level - but I'm not comfortable with this sort of thing being valid with the current API design (as I believe it currently is):

Loop::setFactory(new LoopFactory);

Loop::onReadable(...);
Loop::onReadable(...);
// ...

Loop::get()->run();

What do we gain by allowing the loop to be used outside of ::execute()?

kelunik commented 8 years ago

Given the layers of callbacks that occur in this type of programming, I can see this being a frequent problem.

Multiple event loops will be very, very rare, so I don't think this will be a frequent problem.

What do we gain by allowing the loop to be used outside of ::execute()?

The ability to support synchronous wait for applications not entirely written within an event loop.

AndrewCarterUK commented 8 years ago

If multiple event loops will be very, very rare - and the API we have here will run into the issues highlighted when they are used - I think we should do more to make it clear that the consumers are on their own and not supported by the standard when they go down this path.

It feels wrong supporting a very, very rare use case that we know has the potential to encourage bugs when used (unless everyone adopts very defensive programming).

If a consumer wants multiple loops, there's nothing stopping them using the drivers directly and I feel this would be much safer - as then a consumer/service/package could guarantee that the return value of Loop::onReadable could always be passed to Loop::disable, for example (which they currently can't).

We could implement the same sort of protection against accidental event loop overwriting with ::set(), ::get() - just by making it so that you couldn't set an event loop on top of an existing one. If there really is a need for it, we could offer ::replace(), with warnings of the dangers.

kelunik commented 8 years ago

I think we should do more to make it clear that the consumers are on their own and not supported by the standard when they go down this path.

Not supporting a clean shutdown in Aerys sounds like a bad idea. Maybe @bwoebi can shed more light on why we need a new loop there.

If a consumer wants multiple loops, there's nothing stopping them using the drivers directly and I feel this would be much safer

I can't agree here. Every single package will coded using the global accessor, you won't be able to inject your custom loop anywhere then, thus use none of your usual APIs.

return value of Loop::onReadable could always be passed to Loop::disable

This is guaranteed as long as you don't use objects across multiple loops.

just by making it so that you couldn't set an event loop on top of an existing one

That's exactly a need for clean shutdowns.

bwoebi commented 8 years ago

@kelunik We have that issue with amp/dns because it uses static. But that's exactly the reason why we have fetchState/storeState.

Now it looks like this: https://github.com/amphp/file/blob/amp_v2/lib/functions.php#L14-L25 and won't make any problems [this is for amp/file, but will look similar in amp/dns].

With the loop as we have designed it currently, all the places where it was an issue until now, aren't an issue anymore.

@AndrewCarterUK

What do we gain by allowing the loop to be used outside of ::execute()?

as it was said, "wait" support. But you technically can run Loop::get()->run() anywhere and shoot yourself into your own feet... Just because something is possible and can be used for a few well-selected use cases, you shouldn't necessarily do it. Loop::get() really should only be used by few libraries; the main reasons to use it currently are the wait function and to get the underlying handle.

It feels wrong supporting a very, very rare use case that we know has the potential to encourage bugs when used (unless everyone adopts very defensive programming).

While rare, aerys uses it twice [apart from the initial main loop start]. Once to try to continue a loop in a shutdown handler (an attempt of proper cleanup) [Good, that's not actually nested, it's just Loop::execute(..., Loop::get());]. The other time it creates a new execution context in the stopping function to send a stop signal to the master process in a shutdown handler, which must not be interrupted by anything else. [To be sure that there won't be randomly induced fatals from otherwhere killing aerys before having notified the master.]

So basically, the main use case is uninterruptible (from within the process) execution of something. If you begin deeply nesting your execute()'s, you clearly have misunderstood something. In 99% of time you should only be in the top most loop level. Only use nested Loop::execute() if you really need it and with care.

AndrewCarterUK commented 8 years ago

@kelunik

What you're saying suggests that using multiple event loops is not a rare case? I don't think this can go both ways. Either they're common, and we have a problem because the target of the static proxy is switching - or they're rare - in which case, why are we allowing the safety of the API to be compromised by a non-standard use case?

return value of Loop::onReadable could always be passed to Loop::disable

This is guaranteed as long as you don't use objects across multiple loops.

Which can't be guaranteed. If you're writing a service or a package, you have no control over where your service is called from. You couldn't even do a clean up in a destructor because you have no idea where in the call stack that might happen.

I can't agree here. Every single package will coded using the global accessor, you won't be able to inject your custom loop anywhere then, thus use none of your usual APIs.

Then I would request that we consider removing the static proxy and advocating dependency injection instead.

To best summarise my opinion: switching the target for the static proxy functions is a dangerous operation that encourages bugs and confusion. If I was creating say a HTTP client using this standard, I would be very concerned that I could not guarantee I was accessing the same loop (or any loop at all) for the public methods in my class.

The current design of the static proxy requires you to use the registry to store watchers if you want to operate safely. I can't support this, and I think it introduces more complexity than dependency injection (which is a well used and understood practice) would.

I don't have the experience building async apps that others have, but I also find it a bit confusing that Javascript applications can function with one native event loop but the same cannot be achieved in PHP. That's probably more a gap in my understanding - but I feel lit is something we should strive for if possible.

Apologies for my complete U-turn on this topic, but I'm not convinced we're heading down the right road.

bwoebi commented 8 years ago

They are rare, but necessary.

And my point still stands; execute() is safer than manual set().

Which can't be guaranteed. If you're writing a service or a package, you have no control over where your service is called from. You couldn't even do a clean up in a destructor because you have no idea where in the call stack that might happen.

Good point; Loop::execute() perhaps should call gc_collect_cycles() to prevent the destructors leaking into another loop.

And yes, you don't have control nor is it your responsibility as a library. It's the users responsibility to not move async objects (i.e. objects which control watchers somehow) into a separate loop. You can compare that to serializing objects and passing them to another process. The other process could unserialize it; but things will break the exactly same way now. And different loop scopes work just exactly like that.

Then I would request that we consider removing the static proxy and advocating dependency injection instead.

Dependency injection won't help it, because the common issue here is then that you yield some Promise which will never be resolved as the loop currently in task is an other one than where the watchers were installed.

The current design of the static proxy requires you to use the registry to store watchers if you want to operate safely. I can't support this, and I think it introduces more complexity than dependency injection (which is a well used and understood practice) would.

No. Don't do that. The registry is meant for global features which are bound to a loop; e.g. amp/dns … you only ever have a single active DNS resolver. There's no point in having two different resolvers active at the same point. … or amp/file … you only have one active file driver [having two file drivers active simultaneously could even break things if I remember correctly.] … For everything else, that's a very bad idea.

Sure, you could reach a little higher degree of safety, but this will come at a cost (complexity in code to check all the time at the library boundaries whether we have switched loops… that's insane).

And as said, dependency injection will just shift the problems, but they are still there and there's nothing you can do about it.

In general, it's not like your points are totally invalid, but the alternatives really are worse, in more hidden and hideous ways.

Regarding Javascript, it is an issue there. They recently expanded a bit on their *Sync API to mitigate the problem, but the separate loop is largely an implementation issue for how node.js core is structured. The current way is really "rewrite the asnyc code in a sync way" … hacky and messy. Not reusable etc..

I guess you're then probably just lacking some experience. We (I and Niklas) have written a lot of async code and seen where we lost our time with debugging and API problems. … We once had the dependency injection. We did away with it and were happy with the results. Some new problems surfaced; which were often just old problems in another form. The static API with execute() and fetch/storeState(), is really a result of that. It solves the problems we are still observing with the current Amp version. It's not perfect, but it's far, far better than pure dependency injection was. It vastly reduced the ways you can shoot yourself into your own feet.

AndrewCarterUK commented 8 years ago

@kelunik / @bwoebi - thanks for your replies. I'll take some time to read through again and when I get a moment I'll check some of the AMP projects.

AndrewCarterUK commented 8 years ago

As an off the cuff thought - is there any way to make

bwoebi commented 8 years ago

@AndrewCarterUK Any way to make? Wanted to send something which got cut off?

AndrewCarterUK commented 8 years ago

Sorry - was going to send something which I've decided against - didn't realised I'd sent the half message - apologies for the notifications.

bwoebi commented 8 years ago

@AndrewCarterUK And further thoughts? If not, may you please close the issue?

AndrewCarterUK commented 8 years ago

Hi all -

I've had no time to resolve my thoughts and I don't believe I will in the near future. It would be best if everyone considered my involvement with async-interop suspended - as I don't wish to hold up development for those waiting for opinions from the group.

Thanks to everyone for all the work and good luck :)

bwoebi commented 8 years ago

@AndrewCarterUK Oh :-( If you're able to find any time nevertheless, you're more than welcome though ;-)

async-interop / event-loop

::execute(), ::setFactory() and ::set() #86