Client Node-Discovery and Auto-configuration (aka "autoconfig")

seancribbs commented 11 years ago

Several other clustered datastores (including Couchbase/Membase, Amazon DynamoDB, MongoDB with the "config server") have ways of discovering new nodes in the cluster. Since any Riak node can handle any request, we should be able to advertise the existence of other nodes to clients so that they can reconfigure automatically, spreading load more evenly among the Riak nodes. This can reduce operational burden, because applications will not need to be restarted to take advantage of a grown cluster. This will also pave the way for the future work of preflist-hinting or client-side routing.

Client-Server Interaction

The goal of this proposal is to provide as much backward- and forward- compatibility as possible. As such, existing request/response cycles are overloaded with new information; the "/" URL on HTTP and the RpbGetServerInfoResp will be modified for this purpose.

In addition to the one-roundtrip cycle, clients should be able to stream updates by providing an additional request parameter or limiting the acceptable media-types. This allows them to receive cluster changes as they occur and update accordingly, perhaps using a background thread or reactor loop. It also removes the need to send reconfiguration messages in-band with other requests.

Implementation Details

PB

The RpbGetServerInfoResp would be modified to add information about other nodes in the cluster, and RpbGetServerInfoReq would be made a full message:

// Added message
message RpbGetServerInfoReq {
    optional bool stream = 1 [default=false];
}

message RpbGetServerInfoResp {
    optional bytes node = 1;
    optional bytes server_version = 2;
    // Added field
    repeated RpbNode nodes = 3;
}

message RpbNode {
    // The node name, same as what would be in the original RpbGetServerInfoResp.
    required bytes name = 1;
    repeated RpbClientInterface interfaces = 2;
}

message RpbClientInterface {
    enum RpbProtocol {
        PB = 1;
        HTTP = 2;
        HTTPS = 3;
    }
    required RpbProtocol protocol = 1;
    // This can be a FQDN or a text representation of an IP address.
    required bytes host = 2;
    required uint32 port = 3;    
}

HTTP

The "root" resource riak_core_wm_urlmap would be modified to return additional information:

A link (in the HTML format) with relation "self" that identifies the URL of the current node and its name:
```
<a href="http://10.0.1.150:8098/" rel="self">riak@riak-dc.bos1.basho.com</a>
```
This could also be represented in a similar fashion in the Link response header. If the node also has PB configured, it should include similar links using the "alternate" relation.
Links for other known nodes in the cluster, listed in the same fashion as the current node but using the "alternate" relation.
If given the query string parameter stream=true, then updates to the configuration will be streamed back to the client.

The JSON format of riak_core_wm_urlmap would change like so:

{
  "riak_kv_wm_buckets": "/riak",
  "riak_kv_wm_counter": "/buckets",
  "riak_kv_wm_index": "/buckets",
  "riak_kv_wm_keylist": "/buckets",
  "riak_kv_wm_link_walker": "/riak",
  "riak_kv_wm_mapred": "/mapred",
  "riak_kv_wm_object": "/riak",
  "riak_kv_wm_ping": "/ping",
  "riak_kv_wm_props": "/buckets",
  "riak_kv_wm_stats": "/stats",
  "riak_solr_indexer_wm": "/solr",
  "riak_solr_searcher_wm": "/solr",
  // Added below here
  "riak@riak-dc.bos1.basho.com":{
      "interfaces":[
          "http://10.0.1.150:8098/",
          "pbc://10.0.1.150:8087/"
      ]
  }, 
  "riak@riak-dc2.bos1.basho.com":{
      "interfaces":[
          "http://10.0.1.151:8098",
          "pbc://10.0.1.151:8087"
      ]
  }
  // u.s.w.
}

Server Implementation

A new dedicated process on each node, probably called riak_api_autoconfig, will be added to broadcast, cache, and serve listener information among the cluster members. When receiving an update from another node, the process will update its internal cache and notify client-socket processes if the state has changed. Client processes will filter the information according to the addresses that are reachable from the peer (CIDR).

Existing code from riak_repl and riak_core will be repurposed or refactored to support detecting and filtering interfaces. Additionally, base HTTP support will likely be moved into riak_api so that client-interface configuration can live in a single place.

TBD: Autoconfig information might also be affected by authorization, see #355.

Risks/Problems

User/sysadmin can configure the client to connect to Riak through a proxy-load-balancer (e.g. HAProxy), in a sense making this feature unneeded. The client should have an option to turn off or ignore autoconfig, as well as disabling it server-side.
Hostname/IP mappings should be identified at startup, or inet_gethost calls could needlessly dominate, assuming someone bound an interface to a hostname rather than IP address. This could get even stickier if administrators use round-robin DNS or VIP, in which case autoconfig should likely be disabled.
We should be careful about sending autoconfig information over the wire when a node is not starting up or cleanly shutting down. Network partitions between nodes could cause flappy and undesirable behavior in clients.

slfritchie commented 11 years ago

Thinking of @Vagabond's recent RFC & work, would this be information that would be exposed before or after authentication?

seancribbs commented 11 years ago

@slfritchie I think at least it should be filterable by what hosts patterns the user can connect from. This needs more thought!

bakins commented 11 years ago

I think having something akin to what couchbase has with moxi would be helpful. A lightweight "proxy" that understands the autoconfig -- and possibly authentication, as well. Could be used in place of haproxy in most of my use cases and I can continue to use "dumb" clients.

seancribbs commented 11 years ago

@bakins What client(s) are you currently using?

bakins commented 11 years ago

@seancribbs ruby, go, and Lua

seancribbs commented 11 years ago

@bakins Not sure we can manage writing our own proxy, but with the way this is designed, it should be very straightforward to write a daemon that reconfigures existing proxy software. Alternatively, a Chef/Puppet/etc run could be used to rewrite the configuration, which is what most people do nowadays. It is my intention that admins will be able to turn off the feature, and client participation is totally optional, even if the feature is enabled.

kuenishi commented 11 years ago

A small nitpick:

message RpbClientInterface {
    enum RpbProtocol {
        PB = 1;
        HTTP = 2;
        HTTPS = 3;
    }
    required RpbProtocol protocol = 1;
    // This can be a FQDN or a text representation of an IP address.
    required bytes host = 2;
    required uint32 port = 3;    
}

To prevent from wrong value, isn't it good to set the type of port as uint16 ?

After all I totally like this idea because several users are suffering network bottleneck at the load balancer or automatic client-side failover of connection.

seancribbs commented 11 years ago

I'd love to, but there's no uint16 type in protobuff.

On Tue, Aug 27, 2013 at 3:26 AM, UENISHI Kota notifications@github.comwrote:

A small nitpick:

message RpbClientInterface { enum RpbProtocol { PB = 1; HTTP = 2; HTTPS = 3; } required RpbProtocol protocol = 1; // This can be a FQDN or a text representation of an IP address. required bytes host = 2; required uint32 port = 3; }

To prevent from wrong value, isn't it good to set the type of port as uint16 ?

— Reply to this email directly or view it on GitHubhttps://github.com/basho/riak/issues/356#issuecomment-23320266 .

Sean Cribbs sean@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/

kuenishi commented 11 years ago

A short Google search made me misunderstand that it has uint16. I should have refered here: https://developers.google.com/protocol-buffers/docs/proto#scalar Anyway, thanks!

xpe commented 11 years ago

+1 I haven't dug into the technical detail above, but this part resonates with my infrastructure needs:

Since any Riak node can handle any request, we should be able to advertise the existence of other nodes to clients so that they can reconfigure automatically, spreading load more evenly among the Riak nodes. This can reduce operational burden, because applications will not need to be restarted to take advantage of a grown cluster.

seancribbs commented 11 years ago

Unfortunately due to time constraints and other big features, this will not be in 2.0.

jaredmorrow commented 10 years ago

Bumped milestone to 2.1

martincox commented 5 years ago

We've talked about this at 365. Definitely need this.

basho / riak