akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

Application hangs on shutdown when using remote provider (1.2.2) #2829

Closed object closed 7 years ago

object commented 7 years ago

As long as provider is set to Akka.Actor.LocalActorRefProvider all works fine, switched to Akka.Remote.RemoteActorRefProvider, Akka.Remote (tried both Helios and DotNetty), when I tried to stop the application services, the following lines appear in the log:

2017-07-05 16:36:35 [Debug] "received AutoReceiveMessage : [akka://OJ/user] - ExistenceConfirmed=True" 2017-07-05 16:36:35 [Information] "Shutting down remote daemon." 2017-07-05 16:36:35 [Error] "Object reference not set to an instance of an object." System.NullReferenceException: Object reference not set to an instance of an object. at Akka.Remote.RemoteSystemDaemon.b__5_0() 2017-07-05 16:36:35 [Debug] "Restarting" 2017-07-05 16:36:35 [Debug] "Restarted (Akka.Remote.RemoteActorRefProvider+RemotingTerminator)"

The issue may have been already addressed here: https://github.com/akkadotnet/akka.net/pull/2817

zbynek001 commented 7 years ago

I've run into this today as well

2017-07-05 07:11:55.417 +00:00 [Information] "Shutting down remote daemon."
2017-07-05 07:11:55.417 +00:00 [Debug] Resolve of path sequence [/"system/cluster/core/daemon#854051853"] failed
2017-07-05 07:11:55.417 +00:00 [Error] "Object reference not set to an instance of an object."
System.NullReferenceException: Object reference not set to an instance of an object.
   at Akka.Remote.RemoteSystemDaemon.<TerminationHookDoneWhenNoChildren>b__5_0()
   at Akka.Util.Switch.WhileOn(Action action)
   at Akka.Remote.RemoteSystemDaemon.<TellInternal>b__6_1()
   at Akka.Util.Switch.TranscendFrom(Boolean from, Action action)
   at Akka.Remote.RemoteSystemDaemon.TellInternal(Object message, IActorRef sender)
   at Akka.Remote.RemoteActorRefProvider.RemotingTerminator.<InitFSM>b__3_1(Event`1 event)
   at Akka.Actor.FSM`2.ProcessEvent(Event`1 fsmEvent, Object source)
   at Akka.Actor.FSM`2.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
2017-07-05 07:12:05.097 +00:00 [Warning] Coordinated shutdown phase ["actor-system-terminate"] timed out after 00:00:10
Aaronontheweb commented 7 years ago

@zbynek001 @object will look into this ASAP; if I find a fix right away we'll do a 1.2.3 release

zbynek001 commented 7 years ago

I think it was introduced with this commit: https://github.com/akkadotnet/akka.net/commit/bdcc0f8d16a9ededce26950e430de4639e4b5b36

var addr = ((ExtendedActorSystem) context.System).Provider.DefaultAddress ?? Address.AllSystems; accessing Provider.DefaultAddress triggers creating RemoteSystemDaemon before RemoteActorRefProvider.Init method has finished executing, so _remotingTerminator is null

Aaronontheweb commented 7 years ago

.... that would do it

Aaronontheweb commented 7 years ago

accessing Provider.DefaultAddress triggers creating RemoteSystemDaemon before

I think what you meant here is that Provider is null and in the middle of being created when this method is called.

zbynek001 commented 7 years ago

no, I don't think it's about Provider. It's about RemoteActorRefProvider.Transport. Accessing Transport triggers CreateInternals

Aaronontheweb commented 7 years ago

You're right... it's the RemoteActorRefProvider.RemoteInternals call....

Aaronontheweb commented 7 years ago

Rolling back the change that introduced this error in 1.2.2. After @zbynek001 pointed out the cause of it, I've decided that including the Address of all actors in the logs, while helpful, won't be worth the head-ache it takes to do that automatically at this time. Going to rethink how to do it at a later date.

object commented 7 years ago

Greate work guys! Looking forward to 1.2.3.

Aaronontheweb commented 7 years ago

We'll deliver 1.2.3 in the next day or so. Have one more bug I want to track down as part of it...