akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

What can cause AkkaProtocolManager to restart ProtocolStateActor every 5 seconds? #3454

Closed stijnherreman closed 6 years ago

stijnherreman commented 6 years ago

Akka.NET version: 1.3.6 DotNetty version: 0.4.6

I'm experiencing an issue when running an actor system via Azure Service Fabric. The actor system runs inside a console application. The system is available for remoting and works as intended, but I'm seeing the following never-ending log output:

[11:32:19 DBG][Thread 0014][akka://redacted/system/transports/akkaprotocolmanager.tcp.0] now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62198-1
[11:32:19 DBG][Thread 0014][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62198-1] Started (Akka.Remote.Transport.ProtocolStateActor)
[11:32:24 DBG][Thread 0013][akka://redacted/system/transports/akkaprotocolmanager.tcp.0] now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62285-2
[11:32:24 DBG][Thread 0013][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62285-2] Started (Akka.Remote.Transport.ProtocolStateActor)
[11:32:24 DBG][Thread 0011][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62198-1] Stopped
[11:32:29 DBG][Thread 0011][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62285-2] Stopped
[11:32:29 DBG][Thread 0014][akka://redacted/system/transports/akkaprotocolmanager.tcp.0] now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62360-3
[11:32:29 DBG][Thread 0012][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62360-3] Started (Akka.Remote.Transport.ProtocolStateActor)
[11:32:34 DBG][Thread 0014][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62360-3] Stopped
[11:32:34 DBG][Thread 0011][akka://redacted/system/transports/akkaprotocolmanager.tcp.0] now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62424-4
[11:32:34 DBG][Thread 0011][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62424-4] Started (Akka.Remote.Transport.ProtocolStateActor)
[11:32:39 DBG][Thread 0012][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62424-4] Stopped
[11:32:39 DBG][Thread 0012][akka://redacted/system/transports/akkaprotocolmanager.tcp.0] now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62491-5
[11:32:39 DBG][Thread 0012][akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A62491-5] Started (Akka.Remote.Transport.ProtocolStateActor)

I'm not seeing that output when running the executable manually, on the same machine. Only when the process is launched through Service Fabric.

Akka.NET config:

akka {
  stdout-loglevel = DEBUG
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  log-config-on-start = on
  log-dead-letters-during-shutdown = on
  remote {
    log-sent-messages = on
    log-received-messages = on
  }
  actor {
    serializers {
      wire = "Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion"
    }
    serialization-bindings {
      "System.Object" = wire
    }
    debug {
      receive = on
      autoreceive = on
      lifecycle = on
      unhandled = on
      fsm = on
    }
    provider = remote
  }
  remote {
    dot-netty.tcp {
      port = 9001
      hostname = 0.0.0.0
      public-hostname = redacted.cloudapp.azure.com
    }
  }
}

Service Fabric ServiceManifest.xml config:

  <ServiceTypes>
    <StatelessServiceType ServiceTypeName="ServerType" UseImplicitHost="true" />
  </ServiceTypes>

  <CodePackage Name="Code" Version="1.0.0">
    <EntryPoint>
      <ExeHost>
        <Program>redacted.exe</Program>
        <Arguments></Arguments>
        <WorkingFolder>Work</WorkingFolder>
        <ConsoleRedirection FileRetentionCount="5" FileMaxSizeInKb="2048"/>
      </ExeHost>
    </EntryPoint>
  </CodePackage>

  <ConfigPackage Name="Config" Version="1.0.0" />

  <Resources>
    <Endpoints>
      <Endpoint Name="ServerTypeEndpoint" Protocol="tcp" Port="9001" UriScheme="akka.tcp" />
    </Endpoints>
  </Resources>

There are no errors in the logs, just the messages mentioned above. What can cause this behaviour?

Aaronontheweb commented 6 years ago

Lol... welp, this is a first! So despite those messages being logged continuously, it looks like Akka.Remote keeps working without any real issues?

stijnherreman commented 6 years ago

Finally found time to continue investigating this issue. Service Fabric turns out to be irrelevant here.

First off, yes remoting keeps working without any real issues. The actor system connecting to this one only sends a message sporadically, but it can continue sending messages to a resolved remote actor.

The problem does not occur with the following config:

      dot-netty : {
        tcp : {
          port : 9001
          hostname : localhost
          public-hostname : []
        }
      }

The problem does occur with the following config:

      dot-netty : {
        tcp : {
          port : 9001
          hostname : 0.0.0.0
          public-hostname : []
        }
      }
    }

Actual config contains a public-hostname but it doesn't matter to reproduce this issue.

Looking again at the log message, the problem might be related to IPv6. Below is the raw log message, and then the unencoded log message:

now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp%3A%2F%2Fredacted%40%5B%3A%3Affff%3A168.63.129.16%5D%3A57136-1 now supervising akka://redacted/system/transports/akkaprotocolmanager.tcp.0/akkaProtocol-tcp://redacted@[::ffff:168.63.129.16]:57136-1

::ffff:168.63.129.16 is an IPv4-mapped IPv6 address, where 168.63.129.16 is a known virtual public IP address in Azure.

Is there any other info that could help determining the cause? I can share ipconfig output in private if necessary.

Aaronontheweb commented 6 years ago

Heh, so it looks like something in Service Fabric is continuously opening connections on that port somewhere... Weird.

First question though: Akka.Remote still delivers and receives traffic correctly in this scenario, right? It's just that new ports are being opened and closed?

And yes, please feel free to send me that data privately to my Gitter handle or you can email me - aaron AT petabridge DOT com

stijnherreman commented 6 years ago

After talking to Aaron and getting some explanations, I finally found the cause. It's Azure Load Balancer probing the open port on a set interval (5 seconds by default, configured by Service Fabric).

References:

So I'm not sure now if there's anything to do here for Akka. Perhaps there's a way to ignore these requests?

Aaronontheweb commented 6 years ago

Perhaps there's a way to ignore these requests?

Not sure what the best way to do that would be - since we don't want to get into the habit of accidentally ignoring a valid inbound request from another node. But yeah, sounds like an environment issue.

I assume in Service Fabric there's a concept of a private port that isn't load-balanced? Would that work?

stijnherreman commented 6 years ago

Well if needed I can just disable the probing. I'll need to spend some time thinking about our environment configuration. I'll go ahead and close this, thanks for your time spent on figuring this out :)

Aaronontheweb commented 6 years ago

Happy to help