akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.72k stars 1.04k forks source link

Disassociated by duplicating Akka.Remote Tcp Port. #6100

Closed kimbyungeun closed 2 years ago

kimbyungeun commented 2 years ago

Version Information

Describe the bug

Application

image

(Old) Mono Client App 1 Log

(New) Mono Client App 2 Error Log

Environment


## ip_local_port_range
* Default Setting

$ cat /proc/sys/net/ipv4/ip_local_port_range 32768 60999

Aaronontheweb commented 2 years ago

hi @kimbyungeun - so all of these processes are running on the same machine? If you have two nodes running on identical IP + Port combinations that's not really supported by TCP as a server can only really listen in on a single port. The socket.bind operation should have failed here and prevented the second ActorSystem from binding to the same port - especially given that you're running on port 0 in your HOCON configuration, which tells the OS to give you a random, but available high-order port.

So I'm not totally clear on what the issue is here because this configuration looks like it shouldn't happen due to how the underlying network stack works - how are both of these clients binding to the same port?

kimbyungeun commented 2 years ago

hi @Aaronontheweb Thank you for answer.

1. so all of these processes are running on the same machine?

image

2. how are both of these clients binding to the same port?

Arkatufus commented 2 years ago

Are all of nodes running inside the same docker image? how do you wire all these docker images?

Aaronontheweb commented 2 years ago

Ah, I missed the detail about Docker - so both of the clients are running in separate Docker containers?

kimbyungeun commented 2 years ago

@Aaronontheweb @Arkatufus Thank you for answer.

Sorry. I'm missing some very important information. I use docker container like virtual os. Create a bash docker container and docker container run the Node App1, Client App1, Client App2 from inside.

Include docker container information

image

Docker container inspect information

Arkatufus commented 2 years ago

I don't think this is a port number collision, if we look at the failing log, the logged ActorSystem is listening at port 33919

[2022/09/14 03:28:38.980] ... Remoting now listens on addresses: [akka.tcp://exeutor-system@100.100.100.234:33919] 

But then it could not communicate to a remote ActorSystem at port 4061

[2022/09/14 03:28:39.190] ... Association with remote system akka.tcp://cluster-system@100.100.100.230:4061 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
kimbyungeun commented 2 years ago

@Arkatufus

The error scenario I'm thinking of is below.

  1. Process 6683 is using Port 33919.
  2. Process 22596 is trying to duplicate Port 33919.
  3. So, communication between Process 6683 and 4061 disassociated .

1. Process 6683 is using Port 33919.

[2022/09/14 03:03:33.290][  112][ INFO]akka.tcp://cluster-system@100.100.100.234:8931/remote/akka.tcp/cluster-system@100.100.100.230:4065/user/ApiMasterActor/singleton/InferenceManageActor/c40 [6683][mono-sgen] [2022/09/14 03:03:33.290][11218][ INFO]akka.tcp://exeutor-system@100.100.100.234:33919/user/Actor2 Start [Elapsed time]=0.011568s  

2. Process 22596 is trying to duplicate Port 33919.

[2022/09/14 03:28:38.980][  119][ INFO]akka.tcp://cluster-system@100.100.100.234:8931/remote/akka.tcp/cluster-system@100.100.100.230:4065/user/ApiMasterActor/singleton/InferenceManageActor/c34 [22596][mono-sgen] [2022/09/14 03:28:38.980][   59][ INFO]remoting (akka://exeutor-system) Remoting started; listening on addresses : [akka.tcp://exeutor-system@100.100.100.234:33919]  

3. So, communication between Process 6683 and 4061 disassociated .

[2022/09/14 03:28:39.192][ 7063][ INFO]akka.tcp://cluster-system@100.100.100.234:8931/remote/akka.tcp/cluster-system@100.100.100.230:4065/user/ApiMasterActor/singleton/InferenceManageActor/c40 [6683][mono-sgen] [2022/09/14 03:28:39.190][11207][ WARN]akka.tcp://exeutor-system@100.100.100.234:33919/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fcluster-system%40100.100.100.230%3A4061-1 Association with remote system akka.tcp://cluster-system@100.100.100.230:4061 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated 
Arkatufus commented 2 years ago

That is weird, is it possible for you to update to .NET Core? Is there a specific reason why you needed to to use .NET Framework, Mono, and Ubuntu 16.04?

kimbyungeun commented 2 years ago

It was developed 3 years ago and cannot be changed. Change to version Akka.NET1.4.41 solve the problem?

Arkatufus commented 2 years ago

No, it would not, the underlying Socket.Bind() code will be the same, I'm more concerned with the Mono implementation of Socket.Bind() in Linux at this point.

kimbyungeun commented 2 years ago

I try update mono 5.20.1.19 -> mono 6.12.0

Aaronontheweb commented 2 years ago

@kimbyungeun is this still an issue after you upgraded?

kimbyungeun commented 2 years ago

@Aaronontheweb

kimbyungeun commented 2 years ago

No issues found after kernel update. Advice helped me solve the problem. @Aaronontheweb @Arkatufus Thank you.

We are going to change .NET 6, Ubuntu 20.04.