Pinot server replacement in Cloud where we have no control over hostname

chenboat commented 5 years ago

Pinot server instance ids are formatted as Server_host_port. These ids are then used in places like table external_view and idealstate. When Pinot brokers construct the routing tables for a query, they use the above instance ids to reconstruct the host/port info to send the Pinot query via HTTP protocol.

Use cases for decoupling pinot server id and HOST/PORT config: We try to deploy Pinot here in Uber's own cloud. When we replace a Pinot server due to hardware failure or upgrade, we use the following process:

Use the same Pinot server id as the replaced server so that all Helix processes still work.
Change the Helix config about HOST/PORT to that of the new added server because we can not get the same hostname of the replaced one.

However, by the current impl (details in comments below), the server instance id will also change due to the changed hostname in Step 2. That implies the new machine can not be a replacement for the old instance even it has all the data copied to it.

We wonder how machine replacement is done in a cloud environment. One proposal we think of is to decouple the server instance id with the underlying communication protocol (e.g., HTTP used for now). In this design, the broker should NOT use the server instance id directly and instead using it to look up from Zookeeper to find out the actual server address. In fact right now these are stored in zk as HELIX_HOST and HELIX_PORT but they are apparently not used in routing table construction.

@fx19880617 @mcvsubbu

chenboat commented 5 years ago

In high level, we should decouple server_id with server_host (port) stored in Helix/zk.

(1) Pinot broker should use the server instance id to look up the server address from Helix/Zk instead of parsing the id string like the following:

https://github.com/apache/incubator-pinot/blob/6e209e4e0ef138f8c21d597aeb20d1993fd1af2b/pinot-core/src/main/java/org/apache/pinot/core/transport/Server.java#L49-L53

(2) Pinot server should decouple the configuration of server Id with server host/port. Right now they are coupled: https://github.com/apache/incubator-pinot/blob/3236e40c2f8d3d603c82e706db41bf2c0c2eaf68/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixServerStarter.java#L142-L151

chenboat commented 5 years ago

Currently when a Pinot server starts and joins the Helix cluster, it calls HelixManager's connect() method: https://github.com/apache/incubator-pinot/blob/3236e40c2f8d3d603c82e706db41bf2c0c2eaf68/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixServerStarter.java#L160

The above method in its turn call ParticipantManager.joinCluster()

https://github.com/apache/helix/blob/7ac3a6a1762bc74869a8f7ea0a9c7feed8b3ba0a/helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java#L144-L156

Notice that in the above code snippet, Helix parses the instanceName (i.e.,) into HOST and PORT field and then stores in Zk. Basically Helix couples instanceName with HELIX_HOST and HELIX_PORT.

Helix further does not allow change of Helix Host and Port via zkHelixAdmin: https://github.com/apache/helix/blob/cfd270f2ab377126c36fd71151d414aee55baf66/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java#L215-L221

One potential solution here is to obtain a HelixDataAccessor directly and overwrite the PORT and HOST name right after the above connect() in HelixServerStarter.

@icefury71 @faiyaz

icefury71 commented 5 years ago

Here's how we can plan our PRs (based on Ting's recommendation)

1) PR to decouple using instance name/ID for routing requests (basically don't assume instance name/ID is a valid discoverable host name. Instead use this iD to look up the real host name) 2) PR to change HelixServerStarter to accept a logical instance iD, host and port for starting a server.

@kishoreg @mayankshriv any thoughts on this before I proceed ?

kishoreg commented 5 years ago

Here's how we can plan our PRs (based on Ting's recommendation)

PR to decouple using instance name/ID for routing requests (basically don't assume instance name/ID is a valid discoverable host name. Instead use this iD to look up the real host name) This can be done asap.

PR to change HelixServerStarter to accept a logical instance iD, host and port for starting a server. host and port are optional, right?. If the instanceId already exists in Helix, we should change the host/port in the instanceconfig rt.

@kishoreg @mayankshriv any thoughts on this before I proceed ?

kishoreg commented 5 years ago

Unrelated to this PR, it will be great if we can support a mode where a node starts and grabs any available instanceId. This will be useful in cloud/kubernetes deployment where a new pinot server can provisioned and started on any host.

icefury71 commented 5 years ago

Unrelated to this PR, it will be great if we can support a mode where a node starts and grabs any available instanceId. This will be useful in cloud/kubernetes deployment where a new pinot server can provisioned and started on any host.

@kishoreg Yes, that can be an extension to this issue. We can have 2 ways of assigning instance ID: a) Explicitly in HelixServerStarter (this is very useful in our environment wherein the instanceId is centrally coordinated outside of Pinot and explicitly assigned) b) Auto assign via some discovery mechanism (either by Pinot controller or external component).

For now, are you ok with this approach ?

kishoreg commented 5 years ago

Looks good to me.

icefury71 commented 5 years ago

@kishoreg thinking more about PR (1):

A performance side effect of decoupling is that every single reference to instance name will result in a Zk call (naive implementation). I'm guessing this is why the instance name was constructed with this information in the first place (which is a nice hack).

For a clean design, how about we create a helper class which caches the map of instance name to <host,port> info ? We can implement a Helix spectator to keep this map updated.

kishoreg commented 5 years ago

Broker already listens to the changes in instance config and caches it. This should be fairly straightforward.

icefury71 commented 5 years ago

Created PR (1) : https://github.com/apache/incubator-pinot/pull/4778

apache / pinot

Pinot server replacement in Cloud where we have no control over hostname #4525