Closed snalanagula closed 6 years ago
Hi, and thanks for reporting this.
Pydoop uses dynamic extension modules, so it's not importable from a zip archive. It should be importable from an egg (also supported by Spark), but this leads to the same error as above. I have just opened issue #276 for this and hope to get to it soon. In the meantime, since you most likely don't need properties anyway, you should be able to work around the problem as follows:
pydoop/__init__.py
so it does not break when properties are not found:--- a/pydoop/__init__.py
+++ b/pydoop/__init__.py
@@ -179,9 +179,7 @@ def read_properties(fname):
with open(fname) as f:
parser.readfp(AddSectionWrapper(f))
except IOError as e:
- if e.errno != errno.ENOENT:
- raise
- return None # compile time, prop file is not there
+ return {}
return dict(parser.items(AddSectionWrapper.SEC_NAME))
git clone --branch 1.2.0 https://github.com/crs4/pydoop
cd pydoop
export HADOOP_HOME=/your/hadoop/home
export JAVA_HOME=/your/java/home
python setup.py build
python setup.py bdist_egg
You should end up with a pydoop-1.2.0-py2.7.egg
(or similar) under dist/
. Try passing this file to sc.addPyFile
instead of the zip one.
Hi Simone,
Thanks for reply and workaround, I have followed the steps provided and the import issue is resolved. but when I am trying to do operations with hdfs I am facing issues.
>>> hdfs.ls('/insight_labs/rdf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/pydoop/hdfs/__init__.py", line 312, in ls
File "build/bdist.linux-x86_64/egg/pydoop/hdfs/__init__.py", line 291, in lsl
File "build/bdist.linux-x86_64/egg/pydoop/hdfs/fs.py", line 150, in __init__
File "build/bdist.linux-x86_64/egg/pydoop/hdfs/fs.py", line 64, in _get_connection_info
File "build/bdist.linux-x86_64/egg/pydoop/hdfs/core/__init__.py", line 55, in core_hdfs_fs
RuntimeError: module not initialized, check that Pydoop is correctly installed
I have tried looking at other issues opened, but no one seems to have build the pydoop in this way. Could you please help me resolve this?
[ex63@xxxxx pydoop]$ python -V
Python 2.7.12 :: Continuum Analytics, Inc.
It is printing hadoop version and hadoop class path
>>> import pydoop
>>> pydoop.hadoop_version()
'2.7.3.2.5.5.0-157'
>>> os.environ['JAVA_HOME']
'/usr/java/jdk1.8.0_65'
>>> pydoop.hadoop_classpath()
'/usr/hdp/2.5.5.0-157/hadoop/hadoop-auth-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-common.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-common-2.7.3.2.5.5.0-157-tests.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-aws-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-azure-datalake-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-nfs-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-annotations-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-azure-datalake.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-aws.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-auth.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-common-tests.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-azure.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-annotations.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-azure-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-nfs.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop-common-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-cli-1.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-core-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jettison-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/avro-1.7.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jets3t-0.9.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/hamcrest-core-1.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jersey-json-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/guava-11.0.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jetty-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/azure-storage-4.2.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jsr305-3.0.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-lang3-3.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jcip-annotations-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-io-2.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-compress-1.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/azure-keyvault-core-0.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/nimbus-jose-jwt-3.9.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/aws-java-sdk-kms-1.10.6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jsp-api-2.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-logging-1.1.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/junit-4.11.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-configuration-1.6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/slf4j-log4j12-1.7.10.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/ranger-plugin-classloader-0.6.0.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/curator-client-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/paranamer-2.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/curator-framework-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/ojdbc6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/ranger-hdfs-plugin-shim-0.6.0.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-lang-2.6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jersey-core-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jetty-util-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/httpcore-4.4.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-digester-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/curator-recipes-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/java-xmlbuilder-0.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-math3-3.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/activation-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/netty-3.6.2.Final.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/xmlenc-0.52.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/stax-api-1.0-2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/zookeeper-3.4.6.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/asm-3.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-databind-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/api-util-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jsch-0.1.54.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-collections-3.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-xc-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/servlet-api-2.5.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/json-smart-1.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-codec-1.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-annotations-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/log4j-1.2.17.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/slf4j-api-1.7.10.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jersey-server-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/httpclient-4.5.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/aws-java-sdk-s3-1.10.6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/xz-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/commons-net-3.1.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-core-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/ranger-yarn-plugin-shim-0.6.0.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jaxb-api-2.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/gson-2.2.4.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/mockito-all-1.8.5.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/aws-java-sdk-core-1.10.6.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-jaxrs-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop/lib/joda-time-2.8.1.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs-nfs-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs-2.7.3.2.5.5.0-157-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/hadoop-hdfs-nfs.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jetty-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/leveldbjni-all-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jsr305-3.0.0.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/xercesImpl-2.9.1.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/okhttp-2.4.0.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-io-2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-logging-1.1.3.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-lang-2.6.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-daemon-1.0.13.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jersey-core-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jetty-util-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/netty-3.6.2.Final.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/asm-3.2.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/netty-all-4.0.23.Final.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jersey-server-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/xml-apis-1.3.04.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-hdfs/lib/okio-1.4.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-timeline-pluginstorage.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-common.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-common-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-client-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-applicationhistoryservice.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-client.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-sharedcachemanager-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-api.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-applicationhistoryservice-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-common-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-timeline-pluginstorage-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-web-proxy.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-applications-unmanaged-am-launcher.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-sharedcachemanager.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-common.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-registry.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-tests-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-registry-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-nodemanager-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-nodemanager.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-api-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-web-proxy-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/hadoop-yarn-server-resourcemanager-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-cli-1.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-core-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jettison-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/avro-1.7.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jets3t-0.9.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jersey-json-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/guava-11.0.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jersey-client-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-beanutils-core-1.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jetty-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/azure-storage-4.2.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/leveldbjni-all-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jsr305-3.0.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/zookeeper-3.4.6.2.5.5.0-157-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-lang3-3.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jcip-annotations-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-io-2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-compress-1.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/azure-keyvault-core-0.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/nimbus-jose-jwt-3.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/metrics-core-3.0.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jsp-api-2.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-logging-1.1.3.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-configuration-1.6.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/curator-client-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/paranamer-2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/curator-framework-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-lang-2.6.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jersey-core-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jetty-util-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/httpcore-4.4.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-digester-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/curator-recipes-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/java-xmlbuilder-0.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-math3-3.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/activation-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/netty-3.6.2.Final.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/objenesis-2.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/xmlenc-0.52.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/stax-api-1.0-2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/zookeeper-3.4.6.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/fst-2.24.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/asm-3.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-databind-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/api-util-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jsch-0.1.54.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-collections-3.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-xc-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/servlet-api-2.5.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/json-smart-1.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-codec-1.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-annotations-2.2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jersey-server-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/httpclient-4.5.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/guice-3.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-beanutils-1.7.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/javax.inject-1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/xz-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/commons-net-3.1.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-core-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jersey-guice-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jaxb-api-2.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/gson-2.2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/javassist-3.18.1-GA.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-cli-1.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-auth-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jettison-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/avro-1.7.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jets3t-0.9.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-core-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hamcrest-core-1.3.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-ant.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jersey-json-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-common.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/protobuf-java-2.5.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-extras.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-gridmix.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/guava-11.0.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-hs.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-openstack-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-beanutils-core-1.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jetty-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jsr305-3.0.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/azure-data-lake-store-sdk-2.1.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-lang3-3.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-sls.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jcip-annotations-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.5.5.0-157-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/okhttp-2.4.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-app-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-openstack.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jaxb-impl-2.2.3-1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-hs-plugins.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-io-2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-compress-1.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-datajoin.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/snappy-java-1.0.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-rumen.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/azure-keyvault-core-0.8.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/nimbus-jose-jwt-3.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/metrics-core-3.0.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jsp-api-2.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-rumen-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-app.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-logging-1.1.3.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/junit-4.11.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-jobclient.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-shuffle-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-configuration-1.6.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-auth.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-streaming.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-core.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/curator-client-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/paranamer-2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-common-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/curator-framework-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-gridmix-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-distcp-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-lang-2.6.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jersey-core-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-extras-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jetty-util-6.1.26.hwx.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/httpcore-4.4.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-digester-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/curator-recipes-2.7.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/java-xmlbuilder-0.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-math3-3.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-sls-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/activation-1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/netty-3.6.2.Final.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/xmlenc-0.52.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/stax-api-1.0-2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/zookeeper-3.4.6.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/asm-3.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/api-util-1.0.0-M20.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-archives-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jsch-0.1.54.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-collections-3.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-datajoin-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-shuffle.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-distcp.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-ant-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jackson-xc-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/servlet-api-2.5.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/json-smart-1.1.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-codec-1.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-archives.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/log4j-1.2.17.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jersey-server-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/httpclient-4.5.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/apacheds-i18n-2.0.0-M15.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-client-hs-2.7.3.2.5.5.0-157.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-beanutils-1.7.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/xz-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-net-3.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jackson-core-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jaxb-api-2.2.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/commons-httpclient-3.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/gson-2.2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/mockito-all-1.8.5.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/htrace-core-3.1.0-incubating.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jackson-jaxrs-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/okio-1.4.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/hadoop-mapreduce-examples.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/avro-1.7.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/hamcrest-core-1.3.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/protobuf-java-2.5.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/leveldbjni-all-1.8.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/commons-io-2.4.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/commons-compress-1.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/junit-4.11.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/paranamer-2.3.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/aopalliance-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/jersey-core-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/netty-3.6.2.Final.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/asm-3.2.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/guice-servlet-3.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/log4j-1.2.17.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/jersey-server-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/guice-3.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/javax.inject-1.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/xz-1.0.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/jersey-guice-1.9.jar:/usr/hdp/2.5.5.0-157/hadoop-mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.5.5.0-157/hadoop/hadoop/lib/native:/usr/hdp/2.5.5.0-157/hadoop/etc/hadoop'
Hi,
It looks like it's not at all straightforward to make a Python package that includes native extensions importable from an egg or other archive. However, I've just tried your pyspark sample code and it works for me if I pass the unzipped installation dir to addFile
with recursive
set to True
. For instance:
cd /tmp
pip install pydoop -t .
And in the pyspark code:
from pyspark import SparkContext, SparkConf
SparkContext.setSystemProperty('spark.executor.memory', '4g')
conf = SparkConf().setAppName("pydoop test")
sc = SparkContext(conf=conf)
sc.addFile("/tmp/pydoop", recursive=True)
rdd = sc.parallelize([
(12, 34, 56, 67),
(34, 56, 87, 354),
(345, 74, 33, 77),
(453, 56, 73, 56)
], 2)
def func(rec):
import sys
from pyspark import SparkFiles
sys.path.insert(0, SparkFiles.get("pydoop"))
from pydoop import hdfs
hdfs.dump("hello", "/user/root/temp_{}.txt".format(rec[0]))
rdd.map(func).take(10)
Note that you need to manually alter sys.path
, since addFile
does not take care of that. I'm using pyspark 2.2.1.
Thanks Simone, The option sc.addFile with recursive is working, I have tested this code in my local VM, but unfortunately the cluster where I need this has spark 1.6.3 which does not have recursive parameter for addFile method.
Regards, Srinivas
Hi,
I believe you can still make it work with the older Spark version. Build pydoop.zip
as in the original post, add it with sc.addFile("/your/path/to/pydoop.zip")
, then you can unpack it on the fly in the worker's code with something like this:
def func(rec):
import sys
import zipfile
import tempdir
from pyspark import SparkFiles
zip_fn = SparkFiles.get("pydoop.zip")
d = tempfile.mkdtemp()
with zipfile.ZipFile(zip_fn, 'r') as zipf:
zipf.extractall(d)
sys.path.insert(0, d)
from pydoop import hdfs
hdfs.dump("hello", "/user/root/temp_{}.txt".format(rec[0]))
Hi,
Thanks for this solution, it is working perfectly fine. The package in PyPi seems very old version (py3compat and other latest fixes were not there). Any plan of uploading the latest package in PyPi?
I tried cloning pydoop from git and build package using python setup.py bdist --format=zip
that created pydoop-2.0a0.linux-x86_64.zip
, then I unzipped it, I see pydoop folder is under the path opt/anaconda2/lib/python2.7/site-packages/
. To make this avaialbe with in Map I have to manually zip pydoop folder by going into that path . Is there any alternative option to python setup.py
to create pydoop.zip(which can be importable after unzip) ?
Regards, Srinivas
I'm going to make an alpha release pretty soon. You should be able to avoid the zip-unzip round trip by simply zipping the contents of build/lib/pydoop
after running python setup.py bdist
.
I have a requirement to write to hdfs inside map, hence am shipping pydoop.zip dependency module to all worker nodes using sc.addPyFile options, but when I try importing pydoop.hdfs I get below error.
Steps followed to create pydoop.zip
Sample pyspark code that I am trying to use pydoop inside map.
Please help me to resolve this.