crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
236 stars 59 forks source link

Pydoop submit script fails #366

Closed orwa-te closed 4 years ago

orwa-te commented 4 years ago

I have tried to run the code for wordCount example linked here https://crs4.github.io/pydoop/tutorial/pydoop_script.html using the pydoop script script.py hdfs_input hdfs_output and it worked fine for me and I could see the results from HDFS. However when I try to run the full-featured version of the program using "Pydoop submit" linked here https://crs4.github.io/pydoop/tutorial/mapred_api.html#api-tutorial using pydoop submit --upload-file-to-cache wc.py wc input output it takes too much time while running without getting any response or result, also the map-reduce job looks like it got stuck and always get something like this in the terminal:

2020-02-08 18:21:05,580 INFO mapreduce.Job: Job job_1581178676163_0001 running in uber mode : false 2020-02-08 18:21:05,583 INFO mapreduce.Job: map 0% reduce 0% 2020-02-08 18:31:34,480 INFO mapreduce.Job: Task Id : attempt_1581178676163_0001_m_000000_0, Status : FAILED AttemptID:attempt_1581178676163_0001_m_000000_0 Timed out after 600 secs ^C[hdadmin@datanode3 pydoop]$

Map-Reduce job fails when using "Pydoop submit"!! What could cause the problem and how to solve it?

simleo commented 4 years ago

To see what went wrong you have to check the individual task logs. You can access them via the Hadoop web UI.

orwa-te commented 4 years ago

After tyring multiple times, the console gives me these messages:

2020-02-10 23:22:03,628 INFO mapreduce.Job: map 0% reduce 0% 2020-02-10 23:32:34,268 INFO mapreduce.Job: Task Id : attempt_1581369620079_0001_m_000000_0, Status : FAILED AttemptID:attempt_1581369620079_0001_m_000000_0 Timed out after 600 secs [2020-02-10 23:32:33.784]Sent signal OUTPUT_THREAD_DUMP (SIGQUIT) to pid 24623 as user hdadmin for container container_1581369620079_0001_01_000002, result=success [2020-02-10 23:32:33.792]Container killed by the ApplicationMaster. [2020-02-10 23:32:33.811]Container killed on request. Exit code is 143 [2020-02-10 23:32:33.812]Container exited with a non-zero exit code 143.

I opened "sys logs" from web UI and could not find any error or even warning messages, but "stderr" data is like this:

Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class Feb 10, 2020 11:22:01 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM' Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" Feb 10, 2020 11:22:01 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" Feb 10, 2020 11:22:02 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"

I searched for the message "Container exited with a non-zero exit code 143" and found that it may be related to the garbage collector or other memory allocation issues. If this is the case, how the default script Pydoop works with no problems!

simleo commented 4 years ago

I see. Try tweaking the memory settings and good luck :)

orwa-te commented 4 years ago

I am running my Hadoop in my single machine on VM installed with 10 GB RAM and 2 processing cores, Centos 7 What is wrong with the following configuration settings? Here are the properties with their values where memory in MB:

yarn-site.xml

yarn.scheduler.minimum-allocation-mb -> 512
yarn.scheduler.minimum-allocation-vcores -> 1
yarn.scheduler.maximum-allocation-vcores -> 2
yarn.nodemanager.resource.memory-mb -> 8192
yarn.nodemanager.resource.cpu-vcores -> 2

mapred-site.xml

mapreduce.map.memory.mb -> 3072
mapreduce.reduce.memory.mb -> 3072
mapreduce.map.java.opts -> Xmx2048m
mapreduce.reduce.java.opts ->  Xmx2048m
yarn.nodemanager.vmem-pmem-ratio -> 2.1
simleo commented 4 years ago

That depends on many factors, including the Hadoop version you're running. You can try asking on the Hadoop mailing lists. In the Docker images we use for testing, the configuration is rather minimal. If you want, you can check it out here.