Netflix / PigPen

Map-Reduce for Clojure
Apache License 2.0
566 stars 55 forks source link

Update to hadoop 2.6.0 and parquet 1.7.0 #163

Closed mbossenbroek closed 8 years ago

mbossenbroek commented 9 years ago

cc @fs111 @pkozikow

I attempted to update the hadoop libraries to v2, but I'm getting the following exception from the integration tests. I'm seeing a similar exception from the pig integration tests.

Do you guys know what I need to use for the requested values? I'm out till next Tuesday, so if you don't figure it out by then, I'll give it another shot.

ERROR in (cascading-pigpen-functional-map-test-test-map) (BaseFlow.java:918) Uncaught exception, not in assertion. expected: nil actual: cascading.flow.FlowException: unhandled exception at cascading.flow.BaseFlow.complete (BaseFlow.java:918) sun.reflect.NativeMethodAccessorImpl.invoke0 (NativeMethodAccessorImpl.java:-2) sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke (Method.java:606) clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:93) clojure.lang.Reflector.invokeNoArgInstanceMember (Reflector.java:313) pigpen.cascading.functional_test$run_flow.invoke (functional_test.clj:33) pigpen.cascading.functional_test$run_flowGT_output.invoke (functional_test.clj:39) pigpen.cascading.functional_test$fn$reify__11868.dump (functional_test.clj:57) pigpen.functional.map_test$test_map.invoke (map_test.clj:31) pigpen.cascading.functional_test/fn (functional_test.clj:45) clojure.test$test_var$fn7187.invoke (test.clj:704) clojure.test$test_var.invoke (test.clj:704) pigpen.cascading.functional_test$cascading_pigpen_functional_map_test_test_map.invoke (functional_test.clj:45) pigpen.cascading.functional_test$eval12324.invoke (NO_SOURCE_FILE:1) clojure.lang.Compiler.eval (Compiler.java:6703) clojure.lang.Compiler.eval (Compiler.java:6666) clojure.core$eval.invoke (core.clj:2927) clojure.main$repl$read_eval_print6625$fn6628.invoke (main.clj:239) clojure.main$repl$read_eval_print6625.invoke (main.clj:239) clojure.main$repl$fn6634.invoke (main.clj:257) clojure.main$repl.doInvoke (main.clj:257) clojure.lang.RestFn.invoke (RestFn.java:1096) clojure.tools.nrepl.middleware.interruptible_eval$evaluate$fn1024.invoke (interruptible_eval.clj:56) clojure.lang.AFn.applyToHelper (AFn.java:152) clojure.lang.AFn.applyTo (AFn.java:144) clojure.core$apply.invoke (core.clj:624) clojure.core$with_bindingsSTAR.doInvoke (core.clj:1862) clojure.lang.RestFn.invoke (RestFn.java:425) clojure.tools.nrepl.middleware.interruptible_eval$evaluate.invoke (interruptible_eval.clj:41) clojure.tools.nrepl.middleware.interruptible_eval$interruptible_eval$fn1065$fn__1068.invoke (interruptible_eval.clj:171) clojure.core$comp$fn4192.invoke (core.clj:2402) clojure.tools.nrepl.middleware.interruptible_eval$run_next$fn1058.invoke (interruptible_eval.clj:138) clojure.lang.AFn.run (AFn.java:22) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615) java.lang.Thread.run (Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize (Cluster.java:120) org.apache.hadoop.mapreduce.Cluster. (Cluster.java:82) org.apache.hadoop.mapreduce.Cluster. (Cluster.java:75) org.apache.hadoop.mapred.JobClient.init (JobClient.java:470) org.apache.hadoop.mapred.JobClient. (JobClient.java:449) cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart (HadoopFlowStepJob.java:107) cascading.flow.planner.FlowStepJob.blockOnJob (FlowStepJob.java:236) cascading.flow.planner.FlowStepJob.start (FlowStepJob.java:162) cascading.flow.planner.FlowStepJob.call (FlowStepJob.java:124) cascading.flow.planner.FlowStepJob.call (FlowStepJob.java:43) java.util.concurrent.FutureTask.run (FutureTask.java:262) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615) java.lang.Thread.run (Thread.java:744)

cloudbees-pull-request-builder commented 9 years ago

NetflixOSS » PigPen » PigPen-pull-requests #11 FAILURE Looks like there's a problem with this pull request

fs111 commented 9 years ago

You have to add this to the config:

jobConf.set( "mapreduce.framework.name", "local" );

In the cascading case, it would even be easier to use our TestPlatforms, which take care of all that setup for you:

https://github.com/Cascading/cascading/blob/2.7/cascading-hadoop2-mr1/src/test/java/cascading/platform/hadoop2/Hadoop2MR1Platform.java

mbossenbroek commented 9 years ago

cc @pkozikow

Sorry, I'm not terribly familiar with Cascading - where do I add that property?

I tried passing it to the properties of the HadoopFlowConnector, but I get the same error:

(HadoopFlowConnector. {"mapreduce.framework.name" "local"})

I'm also not sure where the TestPlatforms would fit in. Currently we create a FlowDef, call connect on the HadoopFlowConnector passing the FlowDef to it, and then call complete on the resulting Flow. Where does the platform fit in there?

mbossenbroek commented 8 years ago

Closing for now. Please feel free to reopen this if there's any desire to update these dependencies.