akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

Possible remoting bug: dropping inbound messages when listening to 0.0.0.0 #796

Closed rogeralsing closed 9 years ago

rogeralsing commented 9 years ago

I'm pretty sure this is a failure in resolving local actors from a given address.

allports

If the chat server is made to listen to IP 0.0.0.0, and then you try to pass a message to one of the IP's that belong to it. e.g. 192.168.1.88 in my case. The server will drop the message as the recipient is not recognized as belonging to that system. see image.

Surly we should resolve that incoming message as beeing local to the chat server?

rogeralsing commented 9 years ago

In Endpoint.cs :

else if ((recipient is RemoteRef || recipient is RepointableActorRef) && !recipient.IsLocal && !settings.UntrustedMode)
{
    if (settings.LogReceive) log.Debug("received remote-destined message {0}", msgLog);

//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// Addresses only contains 0.0.0.0 here. 
// which doesnt match the incoming 192.168.1.88 for me
// thus making listening to 0.0.0.0 pretty pointless I guess
//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    if (provider.Transport.Addresses.Contains(recipientAddress))
    {
        //if it was originally addressed to us but is in fact remote from our point of view (i.e. remote-deployed)
        recipient.Tell(payload, sender);
    }
    else
    {
        log.Error(
            "Dropping message [{0}] for non-local recipient [{1}] arriving at [{2}] inbound addresses [{3}]",
            payloadClass, recipient, recipientAddress, string.Join(",", provider.Transport.Addresses));
    }
}
rogeralsing commented 9 years ago

This can be replicated by changing the host for chat server and chat client to 0.0.0.0 While still providing a valid IP in the actor selection in the ChatClientActor when contacting the server.

Aaronontheweb commented 9 years ago

This is not a bug - this is the intended behavior for Akka.Remote. Binding to 0.0.0.0 means listening on all interfaces - and if you have multiple network interfaces then the matter of determining which IP is your true "accessible" IP becomes more difficult.

This can be resolved, however, using the following configuration settings:

remote {
              log-remote-lifecycle-events = DEBUG

              helios.tcp {
                transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
                    applied-adapters = []
                    transport-protocol = tcp
                #will be populated with a dynamic host-name at runtime if left uncommented
                #public-hostname = "POPULATE STATIC IP HERE"
                hostname = "0.0.0.0"
                port = 4053
              }
            }   

I added a separate setting to the transport layer called "public-hostname" a long time ago for solving this very problem. If you uncomment it out and set it to 127.0.0.1 (or whatever), you can still bind to 0.0.0.0 but reply back using that specific IP address, which should resolve this problem.

rogeralsing commented 9 years ago

But this issue is for incoming messages, Lets say you listen to all interfaces. You will not be able to process messages coming in on any of those as those addresses are not present in the transport address list. As shown on the screenshot

rogeralsing commented 9 years ago

Lets say I have 4 interfaces on my computer. 1.2.3.4 5.6.7.8 9.9.9.9 6.6.6.6

And then we listen to 0.0.0.0

If we now receive a message on any of these interfaces, e.g. 5.6.7.8,then this message will be dropped since it is not present in the Transport.Addresses..

So if I have to register a public hostname, what is the point of listening to those other 3 interfaces? We listen but cannot process any incoming messages on them.

I guess I'm missing some details here?

  1. Just listening to 0.0.0.0 w/o public-hostname makes no sense as the system can not process any incoming messages, everything is dropped. should we throw a config exception to notify the dev that this is an invalid setting?
  2. If the public-hostname is supplied, why don't we just listen to that address? is this a means of solvning NAT issues?
  3. if public-hostname is required for 0.0.0.0 to work, can't we just default to that if public hostname is supplied?
Aaronontheweb commented 9 years ago

So here's the rationale for allowing 0.0.0.0 in the first place:

  1. From a DevOps perspective, when you deploy Akka.NET code into any production environment you're going to have to write code that can determine at run-time what your external-facing IP address is and write that back into your ActorSystem` configuration. This is essential for being able to correctly communicate reachable IPs in Akka.Remote and Akka.Cluster. Web servers, Cassandra, SQL Server, MongoDB... All require similar bootstrapping code usually provided via shell scripts (or done automatically by the fabric / magic inside PaaS environments.) I have code samples for doing this in a couple of different ways in Akka.NET (generic, EC2 SDK, and soon with the Azure SDK.)
  2. Akka.Remote is biased - it's design to only work with configured IPs in the JVM distribution. You can't send a message to a machine using any IP except for the one specifically configured to it. So what are some implications of this? Let's say you've deployed Akka.NET on EC2 and are using an Elastic IP to act as a NAT layer to translate a publicly accessible address to an EC2 instance's private address.... Well, I have bad news: Akka can't bind a socket to that elastic IP (because it's actually an abstraction that sits outside of the VM) and therefore can't register a transport to accept messages from that address... Therefore you can't accept external traffic into an EC2 instance without setting up your own VPN + NAT, which isn't an acceptable solution in many cases.
  3. So I decided to add an optional layer of NAT translation to Akka.NET to solve that problem - that's what the public-hostname is for.

Think of the abstraction this way:

TL;DR; - binding to 0.0.0.0 is really only acceptible for scenarios like the one I described: where you have a layer of network address translation external to the VM that makes it impossible for the socket to bind to the appropriate external-facing address. That problem may sound like an edge-case, but it's actually pretty common in public cloud environments - especially when multiple service providers are involved (i.e. AppHarbor + EC2 in the same AWS region.)

Aaronontheweb commented 9 years ago

@rogeralsing does this make sense? Should we keep this issue open to discuss more?

HakanL commented 8 years ago

Would it make sense for public-hostname to accept a wildcard, so it can receive anything sent to it?

ForrestWang commented 8 years ago

@Aaronontheweb Regarding to the EC2 Elastic IP, when I set the config as: hostname = "0.0.0.0" public-hostname=98.123.12.141 (just example ip)

it doesn't accept the request from remote peer. but it works if I set the hostname as the public DNS hostname = "ec2-98-123-12-141.compute-1.amazonaws.com"

Aaronontheweb commented 8 years ago

@ForrestWang it'd be interesting to see what IP Helios resolves the public DNS to. If Elastic IP doesn't port-forward correctly then that'd be an issue. Also: always include a log. One sentence descriptions aren't going to reveal an issue. 99.99% of the time reported issues like this are problems with a user's specific configuration or a managed service on a public cloud not behaving as expected. Being able to see the error messages from Akka.Remote and Helios usually makes it pretty clear.

troshko111 commented 7 years ago

@Aaronontheweb Is there an equivalent setting for port? e.g Akka supports both bind-port/bind-hostname, Akka.net seems to only support this for hostname (public-hostname), which makes it impossible to use remoting under say NAT where source/target ports differ. Am I missing something or it's an actual limitation?

Aaronontheweb commented 7 years ago

@TarasRoshko no, we don't have any port aliasing built into the underlying transport

troshko111 commented 7 years ago

Thanks for the confirmation

ForteUnited commented 7 years ago

I'm using Azure App Service, somewhat similar to AWS Elastic Beanstalk, and having a really hard time getting the app to make a connection to a remote actor system. I have setup all other Azure infrastructure to accomplish this task, namely VNet Point-to-Site connection. However, no matter what entries I make in my app's configuration I get the exception below:

Akka.Configuration.ConfigurationException: Unknown local address type at Akka.Remote.Transport.DotNetty.DotNettyTransport.d__21.MoveNext()

This is what my HOCON looks like is the App Service App:

akka { actor { provider = "Akka.Remote.RemoteActorRefProvider, Akka.Remote" }

            remote {
                helios.tcp {
                    port = 8090
                    public-hostname = "my-site-name-here.azurewebsites.net"
                    hostname= 0.0.0.0
                }
            }
        }

I have tried a variety of different things in the public-hostname field and have gotten nowhere. I believe the problem is with the hostname=0.0.0.0 part but I'm not sure what else to put there...

Any ideas on getting Azure App Service to connect to a remote actor system?

troshko111 commented 7 years ago

@ForteUnited AFAIK Azure App Service (formerly Web Sites) won't let you listen on custom ports except for 80 and 443, last time I debugged this they would just kill any attempts to even bind a socket

it's not going to work with Akka, you need something different like Cloud Service (old) or Service Fabric (new)

found more details, see this

ForteUnited commented 7 years ago

@TarasRoshko thanks for the link. If you're correct and connecting to a remote actor system from Azure App Service is not possible that is a really huge deal for me.

Did you see the part about Virtual Networks? I'm trying to connect my Azure App Service web api to a VM running on the VNet which is hosting the actor system. I have already successfully connected the app service app to the VNet.

_Virtual Networks

Azure Web Apps may set up their virtual networks, or VNets in order to facilitate connectivity between Azure and on-premise intranets. This special case of network connectivity is also handled differently in the sandbox. In particular, the aforementioned restrictions on private and local addresses are ignored if the target network interface belongs to the app._

Danthar commented 7 years ago

@ForteUnited If you set the hostname to 0.0.0.0 then that means you want the socket to bind to all network interfaces. its an OS level thing Which means you well get an error, since you will also bind to the restricted network interface. Set the hostname to the IP address belonging with the internal VNet, and it will probably work.

troshko111 commented 7 years ago

@ForteUnited I don't think the sandbox will block your outgoing communication, that should work. Regarding the vnet, It sounds like this won't lift the custom port binding restriction, AFAIK that win api is just blacklisted by the sandbox, does not hurt to try though

ForteUnited commented 7 years ago

Ok so far no luck. I have everything setup networking wise correctly, I'm sure of it. I can use kudu to get a shell on the Azure App Service and from there I can "tcpping" my IaaS VM on the VNet that is running the remote actor system. However, no matter what I do in the HOCON in my Web Api in App Service I am not able to get it to start running and make a connection. This is unbelievable. So disappointing. I can't believe there aren't other articles/warnings that mention is big red letters "Akka.Net isn't a good fit for Azure App Service (Platform as a Service) Web Apps....

ForteUnited commented 7 years ago

All, After many hours of debugging and akka.net code review I found the problem. It seems the default behavior is to translate an IpV4 address into an IpV6 address when opening a socket in .net. Akka.net has a configuration setting called enforce-ip-family specifically for this issue when running in mono on linux. Apparently this is also required in Azure App Service because IpV6 is explicitly disabled in the App Service sandbox.

Issue 2194 is very related to my problem, though different exception message given.

My stuff is now working in Azure App Service