NetSys / spark-monotasks

Fast, predictable data analytics based on (and API-compatible with) Apache Spark
Apache License 2.0
25 stars 18 forks source link

Support checkpoint RDDs #9

Open kayousterhout opened 9 years ago

kayousterhout commented 9 years ago

Support for checkpointing was removed with the monotasks change: if a RDD is checkpointed, the resulting job will fail with an exception that looks like: java.lang.Error: org.apache.spark.SparkException: Missing parent partition information for partition 0 of dependency org.apache.spark.OneToOneDependency@643812a8 (should have been set in DAGScheduler)

This can be fixed by fixing the way that the parent partitions get serialized in Macrotask, similar to what's done in other kinds of dependencies where the RDD can get checkpointed.

Once is done, the CheckpointSuite of tests should be re-enabled.