damballa / parkour

Hadoop MapReduce in idiomatic Clojure.
Apache License 2.0
257 stars 19 forks source link

Seeing data-reader mapping error on clojure 1.6 #11

Closed ghost closed 9 years ago

ghost commented 9 years ago

Hey,

I'm trying to get parkour running on our internal data system, and I'm getting an error that I can't really diagnose (as it claims to have been fixed in clojure 1.5 as clj-1034). Any ideas?

Here is the error:

clojure.lang.ExceptionInfo: Conflicting data-reader mapping {:url #<URL jar:file:/prod-analytics-0.1.0-SNAPSHOT-standalone.jar!/data_readers.clj>, :conflict hadoop.conf/configuration, :mappings {parkour/dval #'parkour.io.dval/dval-reader, parkour/dcpath #'parkour.io.dval/dcpath-reader, java.net/uri #'parkour.fs/uri, hadoop.mapreduce/job #'parkour.mapreduce/job, hadoop.fs/path #'parkour.fs/path, hadoop.conf/configuration #'parkour.conf/configuration}}
        at clojure.core$ex_info.invoke(core.clj:4227)
        at clojure.core$load_data_reader_file$fn__6356.invoke(core.clj:6671)
        at clojure.core.protocols$fn__5871.invoke(protocols.clj:76)
        at clojure.core.protocols$fn__5828$G__5823__5841.invoke(protocols.clj:13)
        at clojure.core$reduce.invoke(core.clj:6030)
        at clojure.core$load_data_reader_file.invoke(core.clj:6664)
        at clojure.core.protocols$fn__5883.invoke(protocols.clj:128)
        at clojure.core.protocols$fn__5854$G__5849__5863.invoke(protocols.clj:19)
        at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
        at clojure.core.protocols$fn__5877.invoke(protocols.clj:48)
        at clojure.core.protocols$fn__5828$G__5823__5841.invoke(protocols.clj:13)
        at clojure.core$reduce.invoke(core.clj:6030)
        at clojure.core$load_data_readers$fn__6360.invoke(core.clj:6683)
        at clojure.lang.AFn.applyToHelper(AFn.java:161)
        at clojure.lang.AFn.applyTo(AFn.java:151)
        at clojure.lang.Var.alterRoot(Var.java:336)
        at clojure.core$alter_var_root.doInvoke(core.clj:4839)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at clojure.core$load_data_readers.invoke(core.clj:6680)
        at clojure.core$fn__6363.invoke(core.clj:6686)
        at clojure.core__init.load(Unknown Source)
        at clojure.core__init.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at clojure.lang.RT.loadClassForName(RT.java:2056)
        at clojure.lang.RT.load(RT.java:419)
        at clojure.lang.RT.load(RT.java:400)
        at clojure.lang.RT.doInit(RT.java:436)
        at clojure.lang.RT.<clinit>(RT.java:318)
        at clojure.lang.Namespace.<init>(Namespace.java:34)
        at clojure.lang.Namespace.findOrCreate(Namespace.java:176)
        at clojure.lang.Var.internPrivate(Var.java:163)
        at prod_analytics.core.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:201)
Exception in thread "main" java.lang.ExceptionInInitializerError
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at clojure.lang.RT.loadClassForName(RT.java:2056)
        at clojure.lang.RT.load(RT.java:419)
        at clojure.lang.RT.load(RT.java:400)
        at clojure.lang.RT.doInit(RT.java:436)
        at clojure.lang.RT.<clinit>(RT.java:318)
        at clojure.lang.Namespace.<init>(Namespace.java:34)
        at clojure.lang.Namespace.findOrCreate(Namespace.java:176)
        at clojure.lang.Var.internPrivate(Var.java:163)
        at prod_analytics.core.<clinit>(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:201)

and here is my project.clj:

(defproject prod-analytics "0.1.0-SNAPSHOT"
  :url ""
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojure-csv/clojure-csv "2.0.1"]
                 [org.clojure/algo.generic "0.1.2"]
                 [com.damballa/parkour "0.6.1"]
                 [org.apache.avro/avro "1.7.5"]
                 [org.apache.avro/avro-mapred "1.7.5"
                  :classifier "hadoop2"]
                 [org.codehaus.jsr166-mirror/jsr166y "1.7.0"]
                 ]
  :global-vars {*warn-on-reflection* true}
  :exclusions [org.apache.hadoop/hadoop-core
               org.apache.hadoop/hadoop-common
               org.apache.hadoop/hadoop-hdfs
               org.slf4j/slf4j-api org.slf4j/slf4j-log4j12 log4j
               org.apache.avro/avro
               org.apache.avro/avro-mapred
               org.apache.avro/avro-ipc]

  :repositories [["conjars" "http://conjars.org/repo"]
                ["cloudera" "https://repository.cloudera.com/content/repositories/releases"]]

  :main prod-analytics.core
  :profiles {:provided
             {:dependencies
              [[org.apache.hadoop/hadoop-client "2.0.0-mr1-cdh4.2.0"]
               [org.apache.hadoop/hadoop-core "2.0.0-mr1-cdh4.2.0"] 
               [org.apache.hadoop/hadoop-common "2.0.0-cdh4.2.0"]
               [org.slf4j/slf4j-api "1.6.1"]
               [org.slf4j/slf4j-log4j12 "1.6.1"]
               [log4j/log4j "1.2.17"]]}
             :aot {:aot :all, :compile-path "target/aot/classes"}
             :uberjar [:aot]
             :jobjar [:aot]})
llasram commented 9 years ago

Hmm. This is not a problem I've seen recently myself... What version of Leiningen are you using, and in what context are you getting this exception?

ghost commented 9 years ago

leiningen 2.4.1, and I get this error when I 'hadoop jar' the standalone uberjar.

llasram commented 9 years ago

Potential failure of imagination, but I'm just not seeing how it's possible to that exception with Clojure 1.6, at least without having an actual different data-reader var for hadoop.conf/configuration. Maybe verify that the JAR's clojure/core.clj file is in fact for 1.6?

ghost commented 9 years ago

I hear you, I'm having a similar struggle. Starting "lein repl" from the same directory echoes the version of clojure, in this case, this is what I see:

nREPL server started on port 53875 on host 127.0.0.1 - nrepl://127.0.0.1:53875
REPL-y 0.3.5, nREPL 0.2.6
Clojure 1.6.0
llasram commented 9 years ago

I tried to reproduce this a few ways, and the only way I could do it was by sneaking a version of Clojure 1.5.x onto the classpath. Could you try the following in your environment with your JAR?:

lein do clean, uberjar
zip -d target/prod-analytics-0.1.0-SNAPSHOT-standalone.jar data_readers.clj META-INF/MANIFEST.MF
hadoop jar target/prod-analytics-0.1.0-SNAPSHOT-standalone.jar clojure.main -e '(prn *clojure-version*)'
ghost commented 9 years ago

That is pretty amazing, i have to admit. In doing this I find that somehow I'm not getting what looks like 1.5, but instead 1.4. I have no idea how this could even be there.

$ hadoop jar prod-analytics-0.1.0-SNAPSHOT-standalone.jar  clojure.main -e '(prn *clojure-version*)'
{:major 1, :minor 4, :incremental 0, :qualifier nil}
llasram commented 9 years ago

I'd check your HADOOP_CLASSPATH environment variable, the contents of your configuration hadoop-env.sh, and the output of running hadoop classpath. At least one of those should reveal the offending JAR, and thus hopefully where it came from.

Due to similar issues (albeit not yet with Clojure itself), I generally avoid the hadoop jar command and use hadoop classpath to build a java command-line placing my own (uber) JAR first. E.g., java -cp "example-standalone.jar:$(hadoop classpath)" clojure.main -m example.core. You may also need/want to also set your distribution-specific java.library.path or other native library path properties.

ghost commented 9 years ago

Thanks for the advice. I'm really excited to use parkour and you helped out a great deal here!

EricCat commented 8 years ago

@cpb83 Did you solved this issue? Thanks