crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
236 stars 59 forks source link

Local mode install #337

Closed simleo closed 5 years ago

simleo commented 5 years ago

Fixes #329.

This is a partial revert of #194 that removes the no-local-mode constraint at build time only.

mapred pipes does not support local mode. Indeed, the NullPointerException mentioned in #181 is still unhandled in recent Hadoop versions. However, checking for local mode at build time is neither necessary (we only need YARN to be configured at run time) nor useful (the configuration might change to local mode after the build).

This PR includes a new Docker setup to quickly check what happens in local mode:

docker run --rm -it crs4/pydoop-client bash -l
cd int_test/mapred_submitter
mapred pipes -program ${PWD}/mr/map_reduce_java_rw.py -input ${PWD}/input/map_reduce -output /tmp/junk
...
2019-01-21 11:06:24,141 WARN mapred.LocalJobRunner: job_local939509594_0001
java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552)
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:105)
...

Note that the exception is thrown within the pipes package. This means that, while we don't control mapred pipes, we might be able to do something about it in our own submitter, so we still don't have a final answer to #330.

A nice effect of this change is that now it's much easier to create a Pydoop-enabled Docker image, since a working Hadoop conf dir can be provided (e.g., as a volume) at run time.