HCADatalab / powderkeg

Live-coding the cluster!
Eclipse Public License 1.0
159 stars 23 forks source link

pair RDD can't be joined? #24

Closed clojurians-org closed 7 years ago

clojurians-org commented 7 years ago

i try to join two pair rdd together. it failed!

(.join (keg/rdd {:a 1 :b 2})  (keg/rdd {:a "aa" :c "cc"}))

it seems the following rdd just become org.apache.spark.api.java.JavaRDD with Tuple2. it need to be wrapped with JavaPairRDD/fromJavaRDD method to be realized.

(into [] (.join (JavaPairRDD/fromJavaRDD  (keg/rdd {:a 1 :b 2}))  (JavaPairRDD/fromJavaRDD (keg/rdd {:a "aa" :c "cc"}))))
cgrand commented 7 years ago

You should look into keg/join which happens to use this static method. Moreover when you want the pair RDD to be partitioned, you should use keg/by-key instead of keg/rdd. -- On Clojure http://clj-me.cgrand.net/ Clojure Programming http://clojurebook.com Training, Consulting & Contracting http://lambdanext.eu/

clojurians-org commented 7 years ago

i use the keg/join, but it lead to google Optional exception.

(into [] (keg/join (keg/rdd {:a [1 11] :b 2}) :or "x" (keg/rdd {:a "aa" :c "cc"} ) :or "y"))

org.apache.spark.api.java.Optional cannot be cast to
com.google.common.base.Optional

cgrand commented 7 years ago

Which spark version are you using?

Le mer. 22 mars 2017 à 14:38, larluo notifications@github.com a écrit :

i use the keg/join, but it lead to google Optional exception.

(into [] (keg/join (keg/rdd {:a [1 11] :b 2}) :or "x"(keg/rdd {:a "aa" :c "cc"} ) :or "y"))

org.apache.spark.api.java.Optional cannot be cast to com.google.common.base.Optional

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/HCADatalab/powderkeg/issues/24#issuecomment-288400477, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC3sR4sZD5uRIk-J7mxdLFTfjfJ7OX6ks5roSRbgaJpZM4MkzQw .

-- On Clojure http://clj-me.cgrand.net/ Clojure Programming http://clojurebook.com Training, Consulting & Contracting http://lambdanext.eu/

clojurians-org commented 7 years ago
(defproject etl-spark "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :profiles {:provided {:dependencies [[org.apache.spark/spark-core_2.11 "2.1.0"]
                                       [org.apache.spark/spark-sql_2.11 "2.1.0"]
                                       [org.apache.spark/spark-streaming_2.11 "2.1.0"]]}}
  :aot :all
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [hcadatalab/powderkeg "0.5.0"]
                 [clj-time "0.12.2"]])
cgrand commented 7 years ago

I checked the docs and indeed the switch in the Optional is a breaking change that hasn't been accounted for. I will fix next week if no one fixes it before.

Le mer. 22 mars 2017 à 14:43, larluo notifications@github.com a écrit :

(defproject etl-spark "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :profiles {:provided {:dependencies [[org.apache.spark/spark-core_2.11 "2.1.0"] [org.apache.spark/spark-sql_2.11 "2.1.0"] [org.apache.spark/spark-streaming_2.11 "2.1.0"]]}} :aot :all :dependencies [[org.clojure/clojure "1.8.0"] [hcadatalab/powderkeg "0.5.0"] [clj-time "0.12.2"]])

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/HCADatalab/powderkeg/issues/24#issuecomment-288401811, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC3sdJG1pdi2oqskHbo-ulMvP8emioLks5roSV8gaJpZM4MkzQw .

-- On Clojure http://clj-me.cgrand.net/ Clojure Programming http://clojurebook.com Training, Consulting & Contracting http://lambdanext.eu/