apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.06k stars 445 forks source link

Configure Accumulo clients with HDFS root #783

Open mikewalch opened 5 years ago

mikewalch commented 5 years ago

When doing bulk imports, Accumulo clients need to retrieve a Hadoop configuration object with fs.defaultFS set. This is currently done by setting setting $HADOOP_HOME/etc/hadoop on the Java CLASSPATH. It would be nice if you could specify the following in accumulo-client.properties and avoid special configuration of your classpath.

dfs.root = hdfs://localhost:8020

Accumulo could still look for Hadoop config dir on classpath and build Hadoop Configuration object using it. It could override fs.defaultFS if dfs.root property is set when the client is created.

ctubbsii commented 5 years ago

fs.defaultFS is only used for qualifying unqualified paths (paths without a filesystem specified). It is probably bad practice to rely on this mechanism when writing Hadoop client code (it's too dependent on whatever Hadoop configuration is on the class path, or whichever Hadoop Configuration object was used to create the Path object).

Instead of providing our own qualification mechanism (which I think we should not do), we should encourage users to qualify their paths with a Hadoop filesystem instead, when we receive it in our API. If they choose not to do so, and instead rely on the Hadoop behavior of automatically qualifying it with fs.defaultFS, then we should not override that behavior with our own. It is not our responsibility to override the natural behavior of the Hadoop client library we're using.