crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
236 stars 59 forks source link

ERROR: walk (test_local_fs.TestLocalFS) #368

Open pihglez opened 4 years ago

pihglez commented 4 years ago

Run a test with the docker-hadoop big-data-europe image (local, 1 node) with a test file (little women) in pure java and everythin went ok. Tried the same after pydoop installation and the machine seem to go into a infinite loop... almost nothing happend... The system hangs here:

$ pydoop script -c combiner wordcount.py input ejercicio_mapreduce_pyout2
2020-04-06 12:13:05,487 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-04-06 12:13:06,834 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-06 12:13:07,224 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
/opt/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2401: HADOOP_IT.CRS4.PYDOOP.MAPREDUCE.PIPES.SUBMITTER_USER: bad substitution
/opt/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2366: HADOOP_IT.CRS4.PYDOOP.MAPREDUCE.PIPES.SUBMITTER_USER: bad substitution
/opt/hadoop-3.2.1/libexec/hadoop-functions.sh: line 2461: HADOOP_IT.CRS4.PYDOOP.MAPREDUCE.PIPES.SUBMITTER_OPTS: bad substitution
2020-04-06 12:13:08,698 INFO client.RMProxy: Connecting to ResourceManager at resourcemanager/172.18.0.2:8032
2020-04-06 12:13:08,883 INFO client.AHSProxy: Connecting to Application History server at historyserver/172.18.0.6:10200
2020-04-06 12:13:09,091 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1586170683245_0001
2020-04-06 12:13:09,192 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-06 12:13:09,274 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2020-04-06 12:13:09,333 INFO input.FileInputFormat: Total input files to process : 1
2020-04-06 12:13:09,371 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-06 12:13:09,401 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-06 12:13:09,417 INFO mapreduce.JobSubmitter: number of splits:1
2020-04-06 12:13:09,557 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-06 12:13:09,585 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1586170683245_0001
2020-04-06 12:13:09,585 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-04-06 12:13:09,717 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
2020-04-06 12:13:09,774 INFO conf.Configuration: resource-types.xml not found
2020-04-06 12:13:09,775 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-04-06 12:13:10,145 INFO impl.YarnClientImpl: Submitted application application_1586170683245_0001
2020-04-06 12:13:10,181 INFO mapreduce.Job: The url to track the job: http://resourcemanager:8088/proxy/application_1586170683245_0001/
2020-04-06 12:13:10,181 INFO mapreduce.Job: Running job: job_1586170683245_0001

I decided to run the pydoop tests and here is the output (final):

======================================================================
ERROR: walk (test_local_fs.TestLocalFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "hdfs/common_hdfs_tests.py", line 529, in walk
    list(self.fs.walk(b_top))
  File "/usr/local/lib/python2.7/dist-packages/pydoop/hdfs/fs.py", line 633, in walk
    if top['kind'] == 'directory':
TypeError: string indices must be integers, not str

======================================================================
ERROR: walk (test_hdfs_fs.TestHDFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "hdfs/common_hdfs_tests.py", line 529, in walk
    list(self.fs.walk(b_top))
  File "/usr/local/lib/python2.7/dist-packages/pydoop/hdfs/fs.py", line 633, in walk
    if top['kind'] == 'directory':
TypeError: string indices must be integers, not str

----------------------------------------------------------------------
Ran 158 tests in 56.665s

FAILED (errors=2)

Any suggestion?