akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.69k stars 1.04k forks source link

Race conditions within Akka.Remote.Transport.TestTransport #1547

Closed Aaronontheweb closed 8 years ago

Aaronontheweb commented 8 years ago
[20:35:37][Step 1/1] [ERROR][12/14/2015 8:35:38 PM][Thread 0029][[akka://SystemB4/system/endpointManager/reliableEndpointWriter-akka.test%3a%2f%2fSystemA3%40127.0.0.1%3a10-1/endpointWriter]] AssociationError [akka.test://SystemB4@127.0.0.1:10] <- akka.test://SystemA3@127.0.0.1:10: Error [Shut down address: akka.test://SystemA3@127.0.0.1:10] [   at Akka.Remote.EndpointReader.HandleDisassociated(DisassociateInfo info) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Endpoint.cs:line 1694
[20:35:37][Step 1/1]    at Akka.Actor.UntypedActor.Receive(Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\UntypedActor.cs:line 22
[20:35:37][Step 1/1]    at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorBase.cs:line 155
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.ReceiveMessage(Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 144
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.Invoke(Envelope envelope) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 62]
[20:35:37][Step 1/1] Cause: Unknown
[20:35:37][Step 1/1] [ERROR][12/14/2015 8:35:38 PM][Thread 0027][akka://SystemB4/system/endpointManager/reliableEndpointWriter-akka.test%3a%2f%2fSystemA3%40127.0.0.1%3a10-1] Shut down address: akka.test://SystemA3@127.0.0.1:10
[20:35:37][Step 1/1] Cause: Akka.Remote.ShutDownAssociationException: Shut down address: akka.test://SystemA3@127.0.0.1:10 ---> Akka.Remote.Transport.InvalidAssociationException: The remote system terminated the association because it is shutting down.
[20:35:37][Step 1/1]    --- End of inner exception stack trace ---
[20:35:37][Step 1/1]    at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Endpoint.cs:line 1085
[20:35:37][Step 1/1]    at Akka.Remote.EndpointWriter.<SupervisorStrategy>b__20_0(Exception ex) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Endpoint.cs:line 842
[20:35:37][Step 1/1]    at Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, Exception cause, ChildRestartStats failedChildStats, IReadOnlyCollection`1 allChildren) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\SupervisorStrategy.cs:line 52
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.HandleFailed(Failed f) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.FaultHandling.cs:line 393
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 176
[20:35:37][Step 1/1] --- End of stack trace from previous location where exception was thrown ---
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.HandleFailed(Failed f) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.FaultHandling.cs:line 402
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 176
[20:35:37][Step 1/1] [WARNING][12/14/2015 8:35:38 PM][Thread 0021][LocalActorRefProvider(akka://SystemA3)] akka://SystemA3/temp trying to remove non-child f
[20:35:37][Step 1/1] [ERROR][12/14/2015 8:35:38 PM][Thread 0024][akka://SystemA3/system/transports/akkaprotocolmanager.test.2/akkaProtocol-test%3a%2f%2fSystemB4%40127.0.0.1%3a10-1] Object reference not set to an instance of an object.
[20:35:37][Step 1/1] Cause: System.NullReferenceException: Object reference not set to an instance of an object.
[20:35:37][Step 1/1]    at Akka.Remote.Transport.TestTransport.DefaultDisassociate(TestAssociationHandle handle) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Transport\TestTransport.cs:line 177
[20:35:37][Step 1/1]    at Akka.Remote.Transport.TestTransport.<.ctor>b__6_5(TestAssociationHandle x) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Transport\TestTransport.cs:line 71
[20:35:37][Step 1/1]    at Akka.Remote.Transport.TestAssociationHandle.Disassociate() in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Transport\TestTransport.cs:line 602
[20:35:37][Step 1/1]    at Akka.Case.With[TMessage](Action`1 action) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\PatternMatch.cs:line 96
[20:35:37][Step 1/1]    at Akka.Remote.Transport.ProtocolStateActor.<InitializeFSM>b__11_41(StopEvent`2 event) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote\Transport\AkkaProtocolTransport.cs:line 751
[20:35:37][Step 1/1]    at Akka.Actor.FSM`2.Terminate(State`2 upcomingState) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\FSM.cs:line 929
[20:35:37][Step 1/1]    at Akka.Actor.FSM`2.ApplyState(State`2 upcomingState) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\FSM.cs:line 880
[20:35:37][Step 1/1]    at Akka.Actor.FSM`2.ProcessEvent(Event`1 fsmEvent, Object source) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\FSM.cs:line 845
[20:35:37][Step 1/1]    at Akka.Case.Default(Action`1 action) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\PatternMatch.cs:line 112
[20:35:37][Step 1/1]    at Akka.Actor.FSM`2.Receive(Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\FSM.cs:line 756
[20:35:37][Step 1/1]    at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorBase.cs:line 155
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.ReceiveMessage(Object message) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 144
[20:35:37][Step 1/1]    at Akka.Actor.ActorCell.Invoke(Envelope envelope) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka\Actor\ActorCell.DefaultMessages.cs:line 62

Looks like an issue that maybe occurs on shutdown or disassociation. Need to determine if it's an issue with the TestTransport or with something more "important" inside the remoting stack.

Aaronontheweb commented 8 years ago

Broadening this issue to cover race conditions within TestTransport in general - just found this gem inside Akka.Remote.Tests.Performance:

https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka.Remote.Tests.Performance/RemoteMessagingThroughputSpecBase.cs#L88

Fails with the following exception as reported by NBench:

NBench.NBenchException: Error occurred during $Akka.Remote.Tests.Performance.TestTransportRemoteMessagingThroughputSpec+OneWay SETUP. ---> System.AggregateException: One or more errors occurred. ---> System.Threading.Tasks.TaskCanceledException: A task was canceled.
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Akka.Remote.Tests.Performance.RemoteMessagingThroughputSpecBase.Setup(BenchmarkContext context) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote.Tests.Performance\RemoteMessagingThroughputSpecBase.cs:line 87
   at NBench.Sdk.ReflectionBenchmarkInvoker.InvokePerfSetup(BenchmarkContext context)
   at NBench.Sdk.Benchmark.PreRun()
   --- End of inner exception stack trace ---
NBench.NBenchException: Error occurred during $Akka.Remote.Tests.Performance.TestTransportRemoteMessagingThroughputSpec+OneWay RUN. ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Akka.Remote.Tests.Performance.RemoteMessagingThroughputSpecBase.OneWay(BenchmarkContext context) in D:\BuildAgent\work\49b164d63843fb4\src\core\Akka.Remote.Tests.Performance\RemoteMessagingThroughputSpecBase.cs:line 106
   at NBench.Sdk.Benchmark.RunBenchmark()
   at NBench.Sdk.Benchmark.RunSingleBenchmark()
   --- End of inner exception stack trace ---
Aaronontheweb commented 8 years ago

No way an in-memory association should take longer than 2s to complete without some sort of underlying bug in the code.

Aaronontheweb commented 8 years ago

I have a reliable test that can reproduce this now - going to step into it

Aaronontheweb commented 8 years ago

Looks like the issue may be user error - from the original source:

/*
 NOTE: This is a global shared state between different actor systems. The purpose of this class is to allow dynamically
 loaded TestTransports to set up a shared AssociationRegistry. Extensions could not be used for this purpose, as the injection
 of the shared instance must happen during the startup time of the actor system. Association registries are looked
 up via a string key. Until we find a better way to inject an AssociationRegistry to multiple actor systems it is
 strongly recommended to use long, randomly generated strings to key the registry to avoid interference between tests.
*/
object AssociationRegistry {
  private final val registries = scala.collection.mutable.Map[String, AssociationRegistry]()

  def get(key: String): AssociationRegistry = this.synchronized {
    registries.getOrElseUpdate(key, new AssociationRegistry)
  }

  def clear(): Unit = this.synchronized { registries.clear() }
}

https://github.com/akka/akka/blob/master/akka-remote/src/main/scala/akka/remote/transport/TestTransport.scala#L440

Modifying our benchmarks and tests to generate a new key for each test instance...

Aaronontheweb commented 8 years ago

Yeah, looks like it - we were basically re-using the same AssociationRegistry for TestTransport between specs, which is static. Just need to use a random Guid for the akka.remote.test-transport.registry-key on each test run and we're good to go. Going to do some more work to verify but this looks resolved for now.