TestKit ImplicitSender - async await broken

rogeralsing commented 9 years ago

Another async await issue incoming..

in the TestKitBase, we have this lines:

 if(!(this is INoImplicitSender))
{
       InternalCurrentActorCellKeeper.Current = (ActorCell)((ActorRefWithCell)testActor).Underlying;
}
_testActor = testActor;

This sets the testActor as the current sender for all tests that are not decorated with INoImplicitSender

So far so good.

But as test methods can be async, this does not work, for the very same reasons we had to hack the world to get async await to work inside actors.

The implicit context is lost in the await and all code running in the test method be incorrect. And worse: if there are two parallel tests running at the same time, where one has INoImplicitSender and one hasnt' Its completely possible that the context set above leaks over to the other test if that happens to inherit the same thread while awaiting.

So now good folks, here we have correct Task based methods, with broken async await support. As these tests do return tasks, if I get what everyone else is saying here, it should be an easy fix?

(In my head it's not, I still think the implicit actor context needs to async flow in the tests also. But I've been wrong before on this topic :-) )

cc @nvivo @kekekeks @Danthar @HCanber

Someone reported this in a PR recently, that the context had leaked over to another test. And had to set .Current = null

So, shoot, anyone have any ideas how to fix this?

rogeralsing commented 9 years ago

actually, it will leak context even if two or more tests are running and are async,

e.g. test 1 sets context in thread 1, test 2 sets context in thread 2 test 1 calls await , test 2 calls await test 1 resumes on thread 2, test 2 resumes on thread 1

Now implicit Sender is swapped for both tests..

As @nvivo noted on gitter, instead of setting state in a thread static, it could be set in the CallContext. This does in theory solve the issue at hand (at least should do it unless there are more lazer unicorn dragons buried here)

But.. xUnit 1.9.2 blows up if we use CallContext for this.. Hopefully 2.0 is OK with this change.

This would however affect throghput negatively, but I guess correct and slower is better than fast and broken..

Danthar commented 9 years ago

Even then its definitely not a silver bullet. This guy explains it perfectly: http://blog.stephencleary.com/2013/04/implicit-async-context-asynclocal.html

rogeralsing commented 9 years ago

flippin the table

Luckily, we ony store the implicit context data like Sender and current message in the call context. So we are not mutating / evolving any datastructure like in the blog above.

rogeralsing commented 9 years ago

cc @StephenCleary we need you! <3 The async await stuff with implicit contexts are killing us

Aaronontheweb commented 9 years ago

abandon-thread

nvivo commented 9 years ago

My only worry is that these PRs are only treating symptoms, not root causes. The fact is that these problems should not be happening in the first place. This has nothing to do with akka and async being incompatible at some fundamental level, it's just about implementation.

If a synccontext wrapper is a solution, this can be applied to any environment, not only xunit. That means ditching CallContext, changing dispatchers a little bit and fixing these problems forever. But the same solution can be achieved by keep using just CallContext and adjusting the ActorTaskScheduler to flow these values in the right order.

More issues will appear because of async. We should understand what is causing them first, and mainly have an agreement by the leaders on what the issue is and the direction we should follow to fix them once and for all.

I'm afraid keep accepting patches to very specific problems will only turn akka in a patchwork quilt.

kekekeks commented 9 years ago

We should understand what is causing them first

ThreadStatic variables are evil

Current async Receive implementation is a one big hack, which can be improved, however. First of all we do need to have a custom SynchronizationContext. It just has to wrap the current one and be switched with push/pop semantics (see xunit test invoker for reference). We also have to suspend system message queue while task returned by the Actor is still running. If we won't do that, actor can be killed while still processing a message being awaiting for a Task when Kill message arrives. I also think that AsyncBehavior.Reentrant should be removed, people should use PipeTo for that (we can add some syntactic sugar later however).

The only problem with a custom SynchronizationContext I see it that somewhere could be async void method that captures it. But if it only restores the ambient state and nothing else, it should be completely fine.

kekekeks commented 9 years ago

I'm afraid keep accepting patches to very specific problems will only turn akka in a patchwork quilt.

That particular case has nothing to do with async actors, dispatchers, etc. It's because of the free context in which xunit runs tests.

nvivo commented 9 years ago

That particular case has nothing to do with async actors, dispatchers, etc. It's because of the free context in which xunit runs tests.

This is exactly the short sight I'm talking about. It's all the same problem, this is just treating symptoms.

I'm fine with working around a specific issue to keep the project going. But it's different to say this is fixing a real issue. It's not. Just like #905 didn't fix anything, it's all band-aids.

rogeralsing commented 9 years ago

I totally agree with you @nvivo, but there is more to it.

We have to stick with the API we've got for V1+ , OnReceive returns a bool. so we cannot return tasks. and changing that API at this point is a no go.

We might have the luxury to change this for v2 or simply make it work differently in Akka.Typed.

But right now, we are stuck with the implicit context as that is how JVM Akka is designed and the Receive API looks the way it does.

So we need to focus on whats the correct thing to do in the current position. Changing the overall API is not an option right now.

So what is our best alternative right now?

kekekeks commented 9 years ago

OnReceive returns a bool. so we cannot return tasks

It can return Task. The dispatcher just has to support asynchronous execution, that's it.

rogeralsing commented 9 years ago

Our message pipeline is extremely lightweight right now. Making OnReceive return a task, even if it is just a Task.FromResult, gimps our maximum throughput by millions of messages per second.

On my machine it drops from 34 mil messages down to 20 mil messages per sec in the benchmark app, from increased GC pressure alone.

This is a prices that all actors have to pay if we make that change, even if they are not async. Thats a very high price to pay for a feature that is just "nice to have"

kekekeks commented 9 years ago

A completed task cached in a static variable costs absolutely nothing. No GC pressure, etc

rogeralsing commented 9 years ago

I'm not follwoing, when is a completed task a cached static variable? Just declaring a completed task inside the receive method, has the above described effect on throughput.

e.g. var dummy = Task.FromResult(true); This has been tested and is measurable.

Or are you saying we should do something like:

return TaskTrue;

where that task is declared somewhere else?

rogeralsing commented 9 years ago

Another thing that I also don't get is this:

Lets say that OnReceive returns Task or Task<bool> and we are OK with any perf losses from it. Then what?

When should the TaskScheduler or SynchronizationContext be applied? (This is still needed as the implicit actor context is the real issue, not async await itself. so the context needs to async flow.)

For every single message? As that needs to be set before OnReceive is invoked. which is super expensive.

I feel like an Ogre with a peanut brain when discussing this topic as everyone seem to have a very clear idea on how this could be solved while I myself don't :)

kekekeks commented 9 years ago

Or are you saying we should do something like: return TaskTrue; where that task is declared somewhere else?

https://github.com/StephenCleary/AsyncEx/wiki/TaskConstants

When should the TaskScheduler or SynchronizationContext be applied?

Before calling Recieve in ActorBase

which is super expensive.

Sync context may be a part of ActorCell and also be reused. SynchronizationContext.SetCurrent is cheap, it just obtains execution context and sets some properties.

private class DummySynchronizationContext : SynchronizationContext
{

}

private static DummySynchronizationContext _context1 = new DummySynchronizationContext(),
    _context2 = new DummySynchronizationContext();

public static void Main()
{
    SynchronizationContext.SetSynchronizationContext(_context2);
    var st = Stopwatch.StartNew();
    for (var c = 0; c < 100000000; c++)
    {
        SynchronizationContext.SetSynchronizationContext(c%2 == 0 ? _context1 : _context2);
    }
    st.Stop();
    Console.WriteLine(st.Elapsed);
    Console.Read();
}

This benchmark takes 00:00:01.2104391 to run on my hardware (windows is running in KVM, but still). ~100 million context switches per second.

kekekeks commented 9 years ago

Oh, found a better the place for SynchronizationContext switching. The same place, where current actor cell is switched.

kekekeks commented 9 years ago

@rogeralsing Please, check how much does performance drop on your hardware with proper benchmarking setup.

rogeralsing commented 9 years ago

Wasnt the consensus that SynchronizationContext could not be used? as we do not own the threads we run on.

kekekeks commented 9 years ago

Wasnt the consensus that SynchronizationContext could not be used? as we do not own the threads we run on.

Please, see implementation.

1) SynchronizationContext is reset after the call in finally block, so there is no garbage left in thread's state. 2) ActorCellSynchronizationContext delegates it's work to original one, so it won't break anything.

StephenCleary commented 9 years ago

Generally, a custom SyncCtx should be avoided. If it's possible to solve this by using the logical call context (.NET 4.5+ only), I would prefer that route. Or if it's possible to access context values through some other means, that would be even better (but a much more substantial change).

SyncCtx is more of a "platform type". UIs have them. ASP.NET vCurrent does have one, but in vNext they are planning to remove it. SignalR explicitly leaves the ASP.NET vCurrent context before executing its user-provided code.

These are not conclusive arguments; just points to consider.

nvivo commented 9 years ago

Thanks @StephenCleary! Very good to know about this.

Danthar commented 9 years ago

Can You explain why a custom sync ctx should be avoided? And why they are moving away from these things in vnext?

-----Original Message----- From: "Stephen Cleary" notifications@github.com Sent: ‎27/‎04/‎2015 19:47 To: "akkadotnet/akka.net" akka.net@noreply.github.com Cc: "Arjen Smits" Deathraven163@gmail.com Subject: Re: [akka.net] TestKit ImplicitSender - async await broken (#907)

Generally, a custom SyncCtx should be avoided. If it's possible to solve this by using the logical call context (.NET 4.5+ only), I would prefer that route. Or if it's possible to access context values through some other means, that would be even better (but a much more substantial change). SyncCtx is more of a "platform type". UIs have them. ASP.NET vCurrent does have one, but in vNext they are planning to remove it. SignalR explicitly leaves the ASP.NET vCurrent context before executing its user-provided code. These are not conclusive arguments; just points to consider. — Reply to this email directly or view it on GitHub.

StephenCleary commented 9 years ago

To be honest, I'm not entirely sure. I was a bit surprised that the SignalR team was adamantly opposed to introducing a SyncCtx, since I consider SignalR as more of a "platform" than a "library".

On the other hand, xUnit (preview v3) always provides a SyncCtx for their unit tests. But that kind of makes sense, since most code will run in some kind of a SyncCtx at runtime.

My gut feelings are: 1) Some components depend on the SyncCtx being a specific type. This is more than just depending on the behavior of the SyncCtx; they're actually downcasting it. I know some ASP.NET components do this today, and IIRC there's some of it on the UI side as well, for things like checking whether there's already a Dispatcher for the current thread. So, your SyncCtx can be LSP-safe but other code will still fail. 2) Introducing a SyncCtx changes the behavior of the system. In particular, await behaves differently in the presence of a SyncCtx. 3) Dealing with SyncCtx is messy in the internal .NET framework code. This is due to a historical error that made SyncCtx part of the ExecutionContext. So there's these fun code paths where the ExecutionContext is captured explicitly without the SyncCtx, and then the SyncCtx is captured by other code and used later. It's just plain messy.

I think it's for these reasons that there's a general move away from SyncCtx.

kekekeks commented 9 years ago

I think we can close the issue for now. Tests have their workaround, reentrancy is removed, ambient context will be removed in Akka.Typed, RunTask/ReceiveActor work fine, God in his heaven, all's right with the world.

raypurchasett commented 7 years ago

I think this has caused #2491. At least removing the custom SynchronizationContext makes the tests happy again.

@kekekeks You mentioned 'tests have their workaround'. Am I missing something I should be doing?

kekekeks commented 7 years ago

@thomastomanek If I remember correctly, ActorCellSynchronizationContext was the workaround. But that was a long time ago, infrastructure has changed so it might be not necessary anymore.

akkadotnet / akka.net

TestKit ImplicitSender - async await broken #907