Open killerwhile opened 6 years ago
A bit more details here:
As of today, the code is looking at configuration properties starting with dfs.namenode.rpc-address
. This works in many cases, but in cases where multiple Hadoop clusters are available from a single gateway, the configuration might have more dfs.namenode.rpc-address
pointing to different clusters, i.e. the current version will fail consistently pointing to the right cluser.
In case of Namenode HA, a nameservice is defined for the cluster, for instance hdfs1. fs.defaultFS
points then to this nameserver, such as hdfs://hdfs1
. What the HDFS-java client is doing to resolve this name service is:
Look at dfs.nameservices
to ensure this is a valid nameservice. Look at dfs.ha.namenodes.daplab2
to retrieve the Namenode serviceIds, often nn1
and nn2
, and finally get the Namenode RPC address looking at dfs.namenode.rpc-address.daplab2.nn1
.
One shortcut could be taken here, from the nameservice, matching dfs.namenode.rpc-address.${nameservice}
would already produce more portable results.
HDFS HA uses what they call
nameservice
, i.e. a logical name resolving in multiple Namenodes inside the configuration. As of today the library supports HA by providing manually the Namenodes, but this could be handled directly within the configuration