crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
236 stars 59 forks source link

problem in connecting to hdfs #367

Open ebrahim-abbasi opened 4 years ago

ebrahim-abbasi commented 4 years ago

Dear there, I am using pydoop in combination with pyftpdlib to provide a FTP server for HDFS. I followed the installation instructions to setup the hadoop. I have a hadoop client connecting to a remote HDFS and from pydoop I am connecting to the hadoop client. When executing the 'hadoop classpath --glob' command it is ok. But in the pydoop/hadoop_utils.py file for the code line of " cp = subprocess.check_output(hadoop classpath --glob", shell=True, universal_newlines=True ).strip()" I am getting this error:

subprocess.CalledProcessError: Command 'hadoop classpath --glob' returned non-zero exit status 127.

Could you please let me know how can I fix this issue? Best

simleo commented 4 years ago

You usually get exit status 127 when bash does not find the command you're trying to run. Make sure the hadoop executable is in the PATH. See Environment Setup in the docs.

ebrahim-abbasi commented 4 years ago

@simleo Thanks for your reply. Here is the content of my etc/environment:

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/home/abbasi/software/hadoop-2.10.0:/home/abbasi/software/hadoop-2.10.0/bin:/home/abbasi/software/hadoop-2.10.0/sbin" HADOOP_HOME="/home/abbasi/software/hadoop-2.10.0" HADOOP_INSTALL="/home/abbasi/software/hadoop-2.10.0" HADOOP_MAPRED_HOME="/home/abbasi/software/hadoop-2.10.0" HADOOP_COMMON_HOME="/home/abbasi/software/hadoop-2.10.0" HADOOP_HDFS_HOME="/home/abbasi/software/hadoop-2.10.0" YARN_HOME="/home/abbasi/software/hadoop-2.10.0" HADOOP_COMMON_LIB_NATIVE_DIR="/home/abbasi/software/hadoop-2.10.0/lib/native" HADOOP_OPTS="-Djava.library.path=/home/abbasi/software/hadoop-2.10.0/lib/native" HADOOP_CONF_DIR="/home/abbasi/software/hadoop-2.10.0/etc/hadoop" CLASSPATH="/home/abbasi/software/hadoop-2.10.0/etc/hadoop:/home/abbasi/software/hadoop-2.10.0/share/hadoop/common/lib/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/common/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/hdfs:/home/abbasi/software/hadoop-2.10.0/share/hadoop/hdfs/lib/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/hdfs/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/yarn:/home/abbasi/software/hadoop-2.10.0/share/hadoop/yarn/lib/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/yarn/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/mapreduce/lib/:/home/abbasi/software/hadoop-2.10.0/share/hadoop/mapreduce/:/home/abbasi/software/hadoop-2.10.0/contrib/capacity-scheduler/*.jar"

Should I add something more?

ilveroluca commented 4 years ago

You need to make sure make sure the hadoop executable is in one of the directories listed in the PATH environment variables. If it isn't, fix the PATH list.

ebrahim-abbasi commented 4 years ago

Is that enough to put jar files returned back by the hadoop classpath --glob in the PATH variable?

My current PATH variable is:

/home/abbasi/pyftpdlib_venv/bin:/home/abbasi/software/hadoop-2.10.0/etc/hadoop:/home/abbasi/software/hadoop-2.10.0:/home/abbasi/software/hadoop-2.10.0/bin:/home/abbasi/software/hadoop-2.10.0/sbin:/usr/lib/jvm/java-8-openjdk-amd64/bin:/home/abbasi/.sdkman/candidates/scala/current/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:

simleo commented 4 years ago

Don't worry about adding the jar files, Pydoop handles that automatically. It only needs the hadoop command to be in the PATH. Try something like:

export PATH="/home/abbasi/software/hadoop-2.10.0/bin:/home/abbasi/software/hadoop-2.10.0/sbin:${PATH}"

And please double check that the hadoop executable script is indeed in /home/abbasi/software/hadoop-2.10.0/bin.