damballa / parkour

Hadoop MapReduce in idiomatic Clojure.
Apache License 2.0
257 stars 19 forks source link

hadoop.tmp.dir override in config/local-mr! #14

Closed mjwillson closed 9 years ago

mjwillson commented 9 years ago

I was wondering why config properties like hadoop.tmp.dir are overridden in config/local-mr! here?

https://github.com/damballa/parkour/blob/master/src/clojure/parkour/conf.clj#L188

From what I can see these are the default values already (at least in hadoop 2.4.0), but hard-coding them here means that I can't override the tmpdir in my local site config.

llasram commented 9 years ago

The local-mr! function is for modifying a configuration to use the local in-process MapReduce implementation for REPL-testing "mixed-mode" jobs. The goal is to take a configuration which describes how to use a remote cluster (HDFS and MapReduce framework/jobtracker) then replace just enough of the MR portion to successfully run jobs which (a) can access HDFS, but (b) run locally in-process. Does that clarify things, or am I missing cases where there needs to be additional non-default configuration to successfully run jobs within a REPL process?

mjwillson commented 9 years ago

Ah, OK I think I got the wrong end of the stick and assumed that local-mr! was required to run a job in local mode in general, but it's just for this special mixed mode. I guess local mode is the default if I just use (parkour.conf/configuration).

Cheers for the clarification anyway, perhaps another one of those little things which people who know about hadoop are assumed to know, but isn't otherwise obvious :)