Open echarles opened 6 years ago
It is not currently possible to run multiple datanode daemons per node, because datanode daemons open the same port. It might be possible if we configure different ports per daemon. There might be other things that conflict.
If you just have multiple volumes per node, the helm chart option dataNodeHostPath
supports multiple disks. From the values.yaml
# A list of the local disk directories on cluster nodes that will contain the datanode
# blocks. These paths will be mounted to the datanode as K8s HostPath volumes.
# In a command line, the list should be enclosed in '{' and '}'.
# e.g. --set "dataNodeHostPath={/hdfs-data,/hdfs-data1}"
dataNodeHostPath:
- /hdfs-data
Thx for the hint on the dataNodeHostPath
.
About the port, I would expect an exposed service to manage any potential conflict. When you expose a service, it is seen as a socket (combination of ip adress + port number), so with multiple datanode pods on a host, we would have different ip adresses all with the same port number. There may be other conflicts (to be tested).
Although labels can be used to exclude hosts, I still see a potential advantage in having multiple pods on the same host in case of unbalanced clusters (in terms of node cpu and memory).
I'd like to give a try to a PodCIDRToPodMapping
and just wonder if you see any other technical blocking points with this? Another advantage with this approach, would be to use a StorageClass
(https://kubernetes.io/docs/concepts/storage/storage-classes/) responsible to mount the volumes on the pod. The current approach with the chart predefining the paths (fixed number of paths) sounds less dynamic.
Excellent questions.
The datanodes daemonset is currently using the hostNetwork
. So they do not get virtual pod IP addresses. Instead, they are associated with the physical IP addresses of cluster nodes. Hence, the port collision concern. Also, AFAIK, daemonset does not define any services on its own. We'll have to somehow add services if we want.
Why do we use hostNetwork
? Two reasons:
HDFS namenode/datanode code is written such that it assumes hostname to IP and the reverse translation to be possible for those daemons. Pod network, in particular kube-dns, does not support this. This can be addressed by having services.
The secure HDFS using Kerberos assumes each daemon instance has a separate Kerberos principal account. And the principal includes the hostname as part of the account name. If we somehow use StatefulSet
for datanodes, this might be addressed.
So hostNetwork
is the easiest way to meet these requirements. There might be other constraints as well.
@kimoonkim @echarles Wish I had seen this issue earlier. I was facing these same issues: did not want to use hostNetwork
and wanted to scale dataNodes
independent of nodes. I made the setup work exactly using StatefulSets
for dataNodes
so the dns lookup/reverse-lookup works.
@tirumaraiselvan I am using for now StatefulSets in this repo to scale up/down the HDFS datanodes independent of the nodes. It works pretty well but I paused the check on locality level. I will get back to this soon to ensure performance.
Did you test the locality on your side with the steps described here?
@echarles No, not yet. I am guessing this is just an optimization i.e. the client first tries to use data from the node it is currently on. How much does this effect the performance in a typical case, where most of the data is scattered?
PodCIDRToNodeMapping requires on datanode per k8s node (This is enforced by the helm charts which deploy DaemonSet for the datanodes).
Now, I would like to host multiple HDFS Datanodes on one K8S nodes (let's say I have capacity on those nodes with separate volumes for each of those datanodes).
What about implementing aside (a new java class) the current
PodCIDRToNodeMapping
aPodCIDRToPodMapping
?I didn't try to see the feasibility but is there anything that would make this not possible? Maybe this topic has already been discussed?