dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.64k stars 123 forks source link

[Feature request] Auto detect self #150

Closed davhdavh closed 1 year ago

davhdavh commented 1 year ago

In the pre-populated config model, where all cluster nodes are known at startup, could it be possible to add a feature for the current node to auto detect its own PublicEndPoint in the readiness phase? e.g. services.Configure<HttpClusterMemberConfiguration>(o =>o.AutoDetectSelfFromConfigurationStorage = true);

sakno commented 1 year ago

No, it's not possible. Host may have multiple network cards (physical and virtual) with their own IPs. Which one should be advertised as a hostname? Local hostname is not an option as well, because external DNS is unaware about it. The same with containers. In K8s, each POD has its own local network, and only K8s knows the external POD address for communication between PODs.

davhdavh commented 1 year ago

Isn't it as simple as?

  1. Generate unique value 'KEY' (e.g. Guid.NewGuid()) on local machine
  2. Ask every reachable node what their 'KEY' is
  3. Whichever machine replies with your local machine 'KEY' must be Self When I look in the code the biggest problem is that all of the "local" are marked as readonly, and that it uses IHostedService.StartAsync may block asp.net from start accepting requests (vs BackgroundService.Execute that starts a new task)
sakno commented 1 year ago

all of the "local" are marked as readonly

Because any uncontrolled change of this property leads to undefined behavior of the node in the cluster. RaftCluster is not sealed so I need to protect inheritors from this dangerous action.

HostedService.StartAsync may block asp.net

This prevents local Raft node from handling incoming Raft RPC messages when it is not yet fully initialized.

sakno commented 1 year ago

Another potential issues that can be caused by the requested feature:

localhost as a part of the cluster node address. For instance, we can have the following config:

https://localhost:3262/raft
https://node1:3262/raft
https://node2:3262/raft

All three nodes recognize itself as the first node in this list, obviously. But this is wrong and leads to undefined cluster behavior, which cannot be easily recognized by a developer.

Multiple ports. The local node doesn't know self address, but must be able to detect it using sort of PING message. Okay, but which port and path it should use to listen the message? For instance:

https://node1:3262/raft
https://node2:3263/raft
https://node3:3264/raft
https://node4:3265/raft

To do detection of self address, the node must open 4 network ports. This fact introduces another class of issues. For example, Dockerfile exposes necessary ports explicitly. If port is not exposed, the container will not be able to receive a message from the external party. As a result, the node will not be able to detect self address.

According to that, I can conclude that the requested feature requires too much efforts, leads to more problems with a very little benefit.

davhdavh commented 1 year ago

I think we are talking past each other...

RaftCluster<TMember> supports it, it just isn't implemented in any of the implemented classes, since both RaftCluster and RaftHttpCluster changes to use the EndPoint as the local member identification method, and they both also assume that it is possible to call the announcer immediately.

But If RaftCluster<TMember>.StartAsync is called with a localMemberDetector that always returns false (or without the local member in the config), it will start the node in standby mode with all members in "IsRemote" state and readiness probe unset. and only later when it is actually available (in UnfreezeAsync) will it know which of the members are localMemberId, aka 'self'.

This prevents local Raft node from handling incoming Raft RPC messages when it is not yet fully initialized.

Isn't this the point of the readiness probe? Why must it block the rest of the server from starting before it can proceed?

localhost as a part of the cluster node address. For instance, we can have the following config:

This is not an issue because of the unique value 'KEY'. It is Needed for this specific reason.

To do detection of self address, the node must open 4 network ports

Yes, I can see the problem in implementations that use serverFactory, but that is a property of RaftCluster, I just want the feature in RaftHttpCluster to avoid having to have unique configurations per cluster node.

sakno commented 1 year ago

Isn't this the point of the readiness probe?

It's for clients of the cluster. The node should not be able to process client requests (e.g., read requests) until the readiness probe is set. However, the node should be able to process Raft RPC messages, because it waits for a configuration proposal from the leader to unfreeze itself. Frozen state doesn't not tightly coupled with standby state. You can force Standby state through configuration, but get signaled readiness probe.

This is not an issue because of the unique value 'KEY'

It is. The node itself will always be reachable through localhost address and respond with true. Thus, all N nodes recognize the same address as self. KEY is generated locally, it is unique, but the node receives the same KEY through localhost for self.

sakno commented 1 year ago

Additionally, Uri of the local member contains path component that should be known at startup time. Without that knowledge, it's not possible to setup path handlers:

https://github.com/dotnet/dotNext/blob/56eae959183348cb2f390817abb0e08648a11653/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/ConfigurationExtensions.cs#L201-L205

sakno commented 1 year ago

to avoid having to have unique configurations per cluster node.

AFAIK, it is possible to determine FQDN of the the container from within container (hostname -f). All you need is to set PublicEndPoint through envar using this approach (ASP.NET Core can fetch configuration from multiple sources, including environment variables).