Sqooba / hdfs

A native go client for HDFS
MIT License
17 stars 5 forks source link

Nameservice support #11

Open killerwhile opened 6 years ago

killerwhile commented 6 years ago

HDFS HA uses what they call nameservice, i.e. a logical name resolving in multiple Namenodes inside the configuration. As of today the library supports HA by providing manually the Namenodes, but this could be handled directly within the configuration

killerwhile commented 6 years ago

A bit more details here: As of today, the code is looking at configuration properties starting with dfs.namenode.rpc-address. This works in many cases, but in cases where multiple Hadoop clusters are available from a single gateway, the configuration might have more dfs.namenode.rpc-address pointing to different clusters, i.e. the current version will fail consistently pointing to the right cluser.

In case of Namenode HA, a nameservice is defined for the cluster, for instance hdfs1. fs.defaultFS points then to this nameserver, such as hdfs://hdfs1. What the HDFS-java client is doing to resolve this name service is: Look at dfs.nameservices to ensure this is a valid nameservice. Look at dfs.ha.namenodes.daplab2 to retrieve the Namenode serviceIds, often nn1 and nn2, and finally get the Namenode RPC address looking at dfs.namenode.rpc-address.daplab2.nn1. One shortcut could be taken here, from the nameservice, matching dfs.namenode.rpc-address.${nameservice} would already produce more portable results.