Open tylere opened 13 years ago
Can you post the stderr of the task? You can find it by going to http://localhost:50030/jobdetails.jsp?jobid=job_201111181442_0013, clicking on failed tasks, going to a specific task, then stderr. I suspect it just had a problem with python in launching the job as when it tries to run python it probably isn't using your virtual env. This should go away in the psuedo distributed case if you install hadoopy without virtual env or use launch_frozen (as you did). The benefit of launch_frozen is that it doesn't use any python anything on the cluster (in your case the base python packages which don't appear to include hadoopy) and instead brings it along in a single package.
What you described makes sense... the stderr log indicates that hadoopy cannot be found. However, I prefer not to install python packages system wide. Is there a way to make hadoopy.launch() utilize the python environment that it was invoked in?
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Traceback (most recent call last):
File "wc.py", line 22, in
I'll look into it later, possibly by making an option to change into a virtualenv right away given an optional parameter. If you're interested in looking into it yourself, check out the code for hadoopy.launch and it'll probably be fairly straightforward.
hadoopy test notes
Environment
Start with a 'clean' system::
Start the services::
Check on the services::
Copy some data to HDFS::
Try running an example job::
Make a Python virtual environment::
Test out hadoopy
Setup a working directory::
Download the wc.py example::
Command line test (without args)::
Command line test (map/sort/reduce)::
Download some test files::
copy local files to HDFS::
list them out
Output using hadoopy.launch()
By contrast, using hadoopy.launch_frozen() works.