crs4 / pydoop

A Python MapReduce and HDFS API for Hadoop
Apache License 2.0
237 stars 59 forks source link

Docker target #296

Closed elzaggo closed 6 years ago

elzaggo commented 6 years ago

Added support to run development docker images on machines that require explicit hadoop configuration.

It appears that sometimes -- well, at least on my laptop -- the jvm used in crs4/ansible-hadoop is not able to gather information on the container capabilities and thus misconfigures hadoop components, typically the nodemanager. This PR allows the explicit setting of hadoop configuration properties when launching the crs4/pydoop docker image.

In principle, most of this could be pushed to crs4/ansible-hadoop with the new configuration parameters passed via environment variables.

simleo commented 6 years ago

Proposed changes in elzaggo#1.

One problem is that the Makefile target moves forward even in case of errors (I got an import error because I didn't have lxml installed). In the future we might move this to a set -e bash script, especially since the Makefile is largely unused nowadays.