amperity / sparkplug

Clojure API bindings for Apache Spark
Other
28 stars 1 forks source link

Require the Clojure namespace upon deserializing a SerializableFn #5

Closed brandonvin closed 5 years ago

brandonvin commented 5 years ago

Some experiments showed that passing a function that closed over any Var would result in attempting to use an unbound Var when an executor tried to execute the task.

The root cause is that functions passed to Spark are not serialized using Kryo and the custom registrator. Functions are actually serialized using the regular Java Serializable interface. Hence, none of the logic in sparkplug.kryo to serialize and deserialize Vars applies to functions. (see https://issues.apache.org/jira/browse/SPARK-12414).

The proposed workaround (maybe solution?) here is to store the function's declaring namespace alongside the serialized function, and require that namespace when the function is deserialized.

codecov-io commented 5 years ago

Codecov Report

Merging #5 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master       #5   +/-   ##
=======================================
  Coverage   37.14%   37.14%           
=======================================
  Files          10       10           
  Lines         953      953           
  Branches       24       24           
=======================================
  Hits          354      354           
  Misses        575      575           
  Partials       24       24

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c549f32...47b9d53. Read the comment docs.