akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

An exception occurs when the Non-Seed node is executed and terminated repeatedly. #4083

Closed jiyeongj closed 4 years ago

jiyeongj commented 4 years ago

Hello :)

An exception occurs when the Non-Seed node is executed and terminated repeatedly.

This is Scenario.

  1. Start application SeedNode, NonSeedNode1, NonSeedNode2.
  2. Run the NonSeedNode2/test.bat for test automation.
    • This batch file repeats the process of shutting down and running NonSeedNode2.exe. Set the Timeout value between each course to 5. Each process takes place once in five seconds.
  3. If this process is repeated more than 10 times, the next log is output and SeedNode is down.
    
    [DEBUG][2019-12-04 오전 6:20:10][Thread 0011][[akka://ClusterLab/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2FClusterLab%40%5B%3A%3Affff%3A127.0.0.1%5D%3A12267-15#454443577]] Association between local [tcp://ClusterLab@localhost:8081] and remote [tcp://ClusterLab@[::ffff:127.0.0.1]:12267] was disassociated because the ProtocolStateActor failed: Shutdown
    [DEBUG][2019-12-04 오전 6:20:10][Thread 0012][remoting] Remote system with address [akka.tcp://ClusterLab@localhost:8092] has shut down. Address is now gated for 5000ms, all messages to this address will be delivered to dead letters.

Process is terminated due to StackOverflowException.


4. When debugging the SeedNode where **StackOverflowException** occur, the call stack is as follows.

: Akka/Address.cs

![image](https://user-images.githubusercontent.com/45417052/70119127-f511ec80-16ac-11ea-9acb-6afc5d9e8a81.png)
ghost commented 4 years ago

@jiyeongj Thanks for the detailed bug report and the sample code to go with it.

I followed your notes and couldn't reproduce the issue, the test.bat script ran 30 times until completion.

image

I'll try with a few different configurations and see if I can approach it from a different angle. But, it might help if you turn on TRACE and actor logging (see below). If you can do this for all three nodes and send me a gist of everything up to and including the crash it might help.

akka {  
    loglevel = TRACE
    actor {                
        debug {  
              receive = on 
              autoreceive = on
              lifecycle = on
              event-stream = on
              unhandled = on
        }
    }
}  
jiyeongj commented 4 years ago

First, I built it with Release/Any CPU.

And I did as you recommended loglevel = TRACE. But The following exceptions have occurred:

 System.ArgumentException: Unknown LogLevel: "TRACE". Valid values are: "DEBUG", "INFO", "WARNING", "ERROR"

So, Configuration was changed as follows.

akka {  
    loglevel = debug
    actor {                
        debug {  
              receive = on 
              autoreceive = on
              lifecycle = on
              event-stream = on
              unhandled = on
        }
    }
}  

And the test.bat file was modified.

StackOverflowException occurred and downed a few seconds later as shown below.

ezgif com-video-to-gif (2)

ghost commented 4 years ago

@jiyeongj Thanks again for the update.

I was able to reproduce the issue, and I'm pretty sure I know what the problem is.

I'll put together a fix and link it to this issue so you're notified when it's updated.