avast / hdfs-shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Apache License 2.0
150 stars 33 forks source link

hdfs-shell do not read conf file from files defines into HADOOP_CONF_DIR #19

Closed jmercier-lbi closed 5 years ago

jmercier-lbi commented 5 years ago

Dear,

As I own a specifics FileSystem and using kereberos, I have a custom hdfs-site.xml and core-site.xml . So vanilla hdfs-shell did not work as it uses default configuration file defines from the classpath. As an example, core-default.xml is taken from hadoop-common jar.

image legend: Application state while debugging

The environment variable HADOOP_CONF_DIR is set to: /usr/hdp/3.1.0.0-78/hadoop/conf/ which contained the required configuration files.

if I add explicitly after line 184

configuration.addResource(new Path("file:///usr/hdp/3.1.0.0-78/hadoop/conf/hdfs-site.xml"));
configuration.addResource(new Path("file:///usr/hdp/3.1.0.0-78/hadoop/conf/core-site.xml"));

These 2 lines solves my problem, which show that HADOOP_CONF_DIR is not used in order to discover resources files.

Moreover, I trace the process to read the configuration files and put a breakpoint when parsing the file . When the resource var is equaled to core-site.xml both the ùrlanddocvar wasnull` as it fails to find the resource.

So should I build with:

thanks

Best regards

Vity01 commented 5 years ago

I don't understand your problem. The HADOOP_CONF_DIR env variable is used here https://github.com/avast/hdfs-shell/blob/master/deploy/bin/hdfs-shell.sh to add additional configuration files on classpath - org.apache.hadoop.conf.Configuration will find and read them. If you want to use them in IntelliJ, you have to add them to classpath eg. this way: compile files("c:\\App\\hadoop\\etc\\hadoop")

jmercier-lbi commented 5 years ago

Thanks @Vity01 for your help

I think I have not installed hdfs-shell as you expected

my process:

$ gradle installDist
$ cp -r  build/install/hdfs-shell/*  ~/.local/
$ PATH="~/.local/bin/:$PATH"
$ hdfs-shell
$ cat ~/.local/bin/hdfs-shell
#!/usr/bin/env sh

#
# Copyright 2015 the original author or authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

##############################################################################
##
##  hdfs-shell start up script for UN*X
##
##############################################################################

# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
    ls=`ls -ld "$PRG"`
    link=`expr "$ls" : '.*-> \(.*\)$'`
    if expr "$link" : '/.*' > /dev/null; then
        PRG="$link"
    else
        PRG=`dirname "$PRG"`"/$link"
    fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/.." >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null

APP_NAME="hdfs-shell"
APP_BASE_NAME=`basename "$0"`

# Add default JVM options here. You can also use JAVA_OPTS and HDFS_SHELL_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS=""

# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"

warn () {
    echo "$*"
}

die () {
    echo
    echo "$*"
    echo
    exit 1
}

# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
  CYGWIN* )
    cygwin=true
    ;;
  Darwin* )
    darwin=true
    ;;
  MINGW* )
    msys=true
    ;;
  NONSTOP* )
    nonstop=true
    ;;
esac

CLASSPATH=$APP_HOME/lib/hdfs-shell-1.0.7.jar:$APP_HOME/lib/hadoop-nfs-2.7.1.jar:$APP_HOME/lib/hadoop-nfs-connector-3.0.1.jar:$APP_HOME/lib/spring-boot-starter-1.4.3.RELEASE.jar:$APP_HOME/lib/spring-shell-1.2.0.RELEASE.jar:$APP_HOME/lib/junixsocket-native-common-2.0.4.jar:$APP_HOME/lib/junixsocket-common-2.0.4.jar:$APP_HOME/lib/commons-lang3-3.3.2.jar:$APP_HOME/lib/hadoop-client-2.6.0.jar:$APP_HOME/lib/ranger-plugins-audit-1.0.0.jar:$APP_HOME/lib/spring-boot-autoconfigure-1.4.2.RELEASE.jar:$APP_HOME/lib/spring-boot-1.4.2.RELEASE.jar:$APP_HOME/lib/spring-boot-starter-logging-1.4.2.RELEASE.jar:$APP_HOME/lib/spring-context-support-4.3.4.RELEASE.jar:$APP_HOME/lib/spring-context-4.3.4.RELEASE.jar:$APP_HOME/lib/spring-aop-4.3.4.RELEASE.jar:$APP_HOME/lib/spring-beans-4.3.4.RELEASE.jar:$APP_HOME/lib/spring-expression-4.3.4.RELEASE.jar:$APP_HOME/lib/spring-core-4.3.4.RELEASE.jar:$APP_HOME/lib/snakeyaml-1.17.jar:$APP_HOME/lib/hadoop-hdfs-2.6.0.jar:$APP_HOME/lib/hadoop-mapreduce-client-app-2.6.0.jar:$APP_HOME/lib/hadoop-mapreduce-client-jobclient-2.6.0.jar:$APP_HOME/lib/hadoop-mapreduce-client-shuffle-2.6.0.jar:$APP_HOME/lib/hadoop-mapreduce-client-common-2.6.0.jar:$APP_HOME/lib/hadoop-mapreduce-client-core-2.6.0.jar:$APP_HOME/lib/hadoop-yarn-client-2.6.0.jar:$APP_HOME/lib/hadoop-yarn-server-common-2.6.0.jar:$APP_HOME/lib/hadoop-yarn-common-2.6.0.jar:$APP_HOME/lib/hadoop-yarn-api-2.6.0.jar:$APP_HOME/lib/ranger-plugins-cred-1.0.0.jar:$APP_HOME/lib/hadoop-common-2.7.1.jar:$APP_HOME/lib/htrace-core-3.0.4.jar:$APP_HOME/lib/hadoop-auth-2.7.1.jar:$APP_HOME/lib/curator-recipes-2.7.1.jar:$APP_HOME/lib/curator-framework-2.7.1.jar:$APP_HOME/lib/curator-client-2.7.1.jar:$APP_HOME/lib/guava-17.0.jar:$APP_HOME/lib/kafka_2.11-1.0.0.jar:$APP_HOME/lib/solr-solrj-5.5.3.jar:$APP_HOME/lib/zkclient-0.10.jar:$APP_HOME/lib/zookeeper-3.4.10.jar:$APP_HOME/lib/jline-2.12.jar:$APP_HOME/lib/commons-io-2.4.jar:$APP_HOME/lib/native-lib-loader-2.0.2.jar:$APP_HOME/lib/slf4j-log4j12-1.7.21.jar:$APP_HOME/lib/log4j-1.2.17.jar:$APP_HOME/lib/hadoop-annotations-2.7.1.jar:$APP_HOME/lib/httpmime-4.5.2.jar:$APP_HOME/lib/httpclient-4.5.2.jar:$APP_HOME/lib/commons-httpclient-3.1.jar:$APP_HOME/lib/commons-configuration-1.6.jar:$APP_HOME/lib/commons-digester-2.1.jar:$APP_HOME/lib/commons-beanutils-core-1.8.0.jar:$APP_HOME/lib/commons-beanutils-1.9.3.jar:$APP_HOME/lib/commons-logging-1.2.jar:$APP_HOME/lib/eclipselink-2.5.2.jar:$APP_HOME/lib/javax.persistence-2.1.0.jar:$APP_HOME/lib/jcl-over-slf4j-1.7.21.jar:$APP_HOME/lib/jul-to-slf4j-1.7.21.jar:$APP_HOME/lib/jetty-util-6.1.26.jar:$APP_HOME/lib/commons-cli-1.2.jar:$APP_HOME/lib/commons-codec-1.10.jar:$APP_HOME/lib/commons-lang-2.6.jar:$APP_HOME/lib/protobuf-java-2.5.0.jar:$APP_HOME/lib/avro-1.7.4.jar:$APP_HOME/lib/jackson-jaxrs-1.9.13.jar:$APP_HOME/lib/jackson-xc-1.9.13.jar:$APP_HOME/lib/jackson-mapper-asl-1.9.13.jar:$APP_HOME/lib/jackson-core-asl-1.9.13.jar:$APP_HOME/lib/xmlenc-0.52.jar:$APP_HOME/lib/netty-3.10.5.Final.jar:$APP_HOME/lib/xercesImpl-2.9.1.jar:$APP_HOME/lib/kafka-clients-1.0.0.jar:$APP_HOME/lib/metrics-core-2.2.0.jar:$APP_HOME/lib/apacheds-kerberos-codec-2.0.0-M15.jar:$APP_HOME/lib/apacheds-i18n-2.0.0-M15.jar:$APP_HOME/lib/api-asn1-api-1.0.0-M20.jar:$APP_HOME/lib/api-util-1.0.0-M20.jar:$APP_HOME/lib/slf4j-api-1.7.21.jar:$APP_HOME/lib/commons-math3-3.1.1.jar:$APP_HOME/lib/commons-net-3.1.jar:$APP_HOME/lib/commons-collections-3.2.2.jar:$APP_HOME/lib/servlet-api-2.5.jar:$APP_HOME/lib/jersey-client-1.9.jar:$APP_HOME/lib/jersey-core-1.9.jar:$APP_HOME/lib/gson-2.7.jar:$APP_HOME/lib/jsr305-3.0.0.jar:$APP_HOME/lib/htrace-core-3.1.0-incubating.jar:$APP_HOME/lib/commons-compress-1.4.1.jar:$APP_HOME/lib/commonj.sdo-2.1.1.jar:$APP_HOME/lib/jackson-databind-2.8.4.jar:$APP_HOME/lib/jopt-simple-5.0.4.jar:$APP_HOME/lib/scala-library-2.11.11.jar:$APP_HOME/lib/httpcore-4.4.5.jar:$APP_HOME/lib/stax2-api-3.1.4.jar:$APP_HOME/lib/woodstox-core-asl-4.4.1.jar:$APP_HOME/lib/noggit-0.6.jar:$APP_HOME/lib/leveldbjni-all-1.8.jar:$APP_HOME/lib/jaxb-api-2.2.2.jar:$APP_HOME/lib/paranamer-2.3.jar:$APP_HOME/lib/snappy-java-1.1.4.jar:$APP_HOME/lib/xz-1.0.jar:$APP_HOME/lib/jackson-annotations-2.8.4.jar:$APP_HOME/lib/jackson-core-2.8.4.jar:$APP_HOME/lib/stax-api-1.0-2.jar:$APP_HOME/lib/activation-1.1.jar

# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
    if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
        # IBM's JDK on AIX uses strange locations for the executables
        JAVACMD="$JAVA_HOME/jre/sh/java"
    else
        JAVACMD="$JAVA_HOME/bin/java"
    fi
    if [ ! -x "$JAVACMD" ] ; then
        die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
    fi
else
    JAVACMD="java"
    which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.

Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi

# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
    MAX_FD_LIMIT=`ulimit -H -n`
    if [ $? -eq 0 ] ; then
        if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
            MAX_FD="$MAX_FD_LIMIT"
        fi
        ulimit -n $MAX_FD
        if [ $? -ne 0 ] ; then
            warn "Could not set maximum file descriptor limit: $MAX_FD"
        fi
    else
        warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
    fi
fi

# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
    GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi

# For Cygwin, switch paths to Windows format before running java
if $cygwin ; then
    APP_HOME=`cygpath --path --mixed "$APP_HOME"`
    CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
    JAVACMD=`cygpath --unix "$JAVACMD"`

    # We build the pattern for arguments to be converted via cygpath
    ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
    SEP=""
    for dir in $ROOTDIRSRAW ; do
        ROOTDIRS="$ROOTDIRS$SEP$dir"
        SEP="|"
    done
    OURCYGPATTERN="(^($ROOTDIRS))"
    # Add a user-defined pattern to the cygpath arguments
    if [ "$GRADLE_CYGPATTERN" != "" ] ; then
        OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
    fi
    # Now convert the arguments - kludge to limit ourselves to /bin/sh
    i=0
    for arg in "$@" ; do
        CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
        CHECK2=`echo "$arg"|egrep -c "^-"`                                 ### Determine if an option

        if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then                    ### Added a condition
            eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
        else
            eval `echo args$i`="\"$arg\""
        fi
        i=$((i+1))
    done
    case $i in
        (0) set -- ;;
        (1) set -- "$args0" ;;
        (2) set -- "$args0" "$args1" ;;
        (3) set -- "$args0" "$args1" "$args2" ;;
        (4) set -- "$args0" "$args1" "$args2" "$args3" ;;
        (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
        (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
        (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
        (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
        (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
    esac
fi

# Escape application args
save () {
    for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
    echo " "
}
APP_ARGS=$(save "$@")

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $HDFS_SHELL_OPTS -classpath "\"$CLASSPATH\"" com.avast.server.hdfsshell.MainApp "$APP_ARGS"

# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
  cd "$(dirname "$0")"
fi

exec "$JAVACMD" "$@"

So, we should not used the script generated by gradle and to use the one stored into deploy dir that works

$ cp deploy/bin/hdfs-shell.sh ~/.local/bin/hdfs-shell

Suggestion:

Maybe override the gradle task could prevent such case.

task installShellScript(){
copy{
            from (deploy/bin/ ')
            into 'build/install/bin'
        }
}
task installAll{  dependsOn installShellScript, installDist }

Thanks a lot

Vity01 commented 5 years ago

You should use the bundled version https://github.com/avast/hdfs-shell/releases as described in readme.md file. It's not necessary to make own build. It already contains the scripts... Thanks for the suggestion.