Better document and tool for spark-submit

ghost commented 7 years ago

This occurs when running the example, after connecting to the cluster.

repl output: https://gist.github.com/brianmingus/cdba100687cd14b17f59a70763cbb2fd stack trace: https://gist.github.com/brianmingus/081d4d8b4d5017e0327da9c226f5e8de

ghost commented 7 years ago

This is because, as the documentation notes, when running in an application you have to set keg/*sc* with a SparkContext. However, the documentation should give an example that actually works.

(ns some.ns
  (:require [powderkeg.core :as keg]))

(binding [keg/*sc* (keg/connect! "local[*]")]
  (into [] ; no collect, plain Clojure
    (keg/rdd ["This is a firest line"  ; here we provide data from a clojure collection.
              "Testing spark"
              "and powderkeg"
              "Happy hacking!"]
     (filter #(.contains % "spark")))))

cgrand commented 7 years ago

Which command line did you use?

ghost commented 7 years ago

This is when using CIDER in emacs. The instructions are correct, but they would be better od you pasted in my example code here.

On Dec 8, 2016 12:54 PM, "Christophe Grand" notifications@github.com wrote:

Which command line did you use?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HCADatalab/powderkeg/issues/6#issuecomment-265733289, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCn-HpqYbNrP0kMLniTx4Rtune3Cd3Yks5rF_3qgaJpZM4LHuxT .

cgrand commented 7 years ago

I don't see the meaningful difference between the current example (raw require out of ns) and your example (require inside ns form). Not a CIDER user could you elaborate? Thanks.

Btw since I misunderstood your problem I made these changes https://github.com/HCADatalab/powderkeg/commit/6abf666a3f8bc5486e953b3bde11b01d82a7b459

cgrand commented 7 years ago

Maybe I misunderstood the context: you spark-submitted a repl and then tried to run the example. In the end all you have needed to do was:


(ns some.ns
(:require [powderkeg.core :as keg]))

;;;;;;;;;;;;;;
;; no keg/connect! because spark-submit
;;;;;;;;;;;;;;

(into [] ; no collect, plain Clojure
(keg/rdd ["This is a firest line" ; here we provide data from a clojure collection.
"Testing spark"
"and powderkeg"
"Happy hacking!"]
(filter #(.contains % "spark"))))```

Am I correct?

ghost commented 7 years ago

Apologies, I fixed the code above. If you could just paste that into the docs.

(Regarding your recent addition: that is incorrect. You have to use binding.)

ghost commented 7 years ago

It occurs to me that using binding breaks repl-driven development. Instead:

(alter-var-root #'keg/*sc* (fn [_] (keg/connect! "local[*]"))) ; seems like this should be wrapped up in a fn

(into [] ; no collect, plain Clojure
    (keg/rdd ["This is a firest line"  ; here we provide data from a clojure collection.
              "Testing spark"
              "and powderkeg"
              "Happy hacking!"]
     (filter #(.contains % "spark"))))

cgrand commented 7 years ago

(keg/connect! "local[*]") already does the alter-var-root.

So following the example from the README.md should have worked (modulo substituting "spark://macbook-pro.home:7077" by "local[*]"). Why/how didn't it worked for you?

Dynamic binding of *sc* is on hold at the moment, there are few use cases for multiple spark connections at a time (and spark tries to dissuade you). Hence the document and code changes I made while misunderstanding your issue.

ghost commented 7 years ago

Indeed, it is working. Not sure what happened before.

HCADatalab / powderkeg

Better document and tool for spark-submit #6