ashkapsky / BigDatalog

Apache License 2.0
57 stars 20 forks source link

Recursion failing because of missing checkpoint blocks #4

Closed thomasrebele closed 7 years ago

thomasrebele commented 7 years ago

I tried to use the recursion, but it fails with a lot of error messages. (See #3 for more details on how I run the program)

content of arcs

A   B
B   C
C   D
D   E
E   F
F   G
G   H

content of bigdatalog.deal

database({ arcs(X:string, Y:string) }).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X, Z), arcs(Z,Y).

command

./bin/run-example datalog.Experiments --program=99 --file=../redirect.txt --queryform="tc(A,B)" --baserelation_arcs=../bigdatalog-java/arcs

error message

Here the part of the output with the first error message (see the attachment for the complete log)

17/10/10 16:35:57 INFO Recursion: Fixed Point Iteration # 2, time: 9170ms
17/10/10 16:35:57 INFO DAGScheduler: Submitting FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29), which has no missing parents
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.9 KB, free 510.0 MB)
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 8.7 KB, free 510.1 MB)
17/10/10 16:35:57 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:43953 (size: 8.7 KB, free: 1135.5 KB)
17/10/10 16:35:57 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1096
17/10/10 16:35:57 INFO DAGScheduler: Submitting 200 missing tasks from FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29)
17/10/10 16:35:57 INFO TaskSchedulerImpl: Adding task set 3.0 with 200 tasks
17/10/10 16:35:57 INFO TaskSetManager: Starting task 121.0 in stage 3.0 (TID 256, localhost, partition 121,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 123.0 in stage 3.0 (TID 257, localhost, partition 123,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 124.0 in stage 3.0 (TID 258, localhost, partition 124,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 125.0 in stage 3.0 (TID 259, localhost, partition 125,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO Executor: Running task 121.0 in stage 3.0 (TID 256)
17/10/10 16:35:57 INFO Executor: Running task 123.0 in stage 3.0 (TID 257)
17/10/10 16:35:57 INFO Executor: Running task 124.0 in stage 3.0 (TID 258)
17/10/10 16:35:57 INFO Executor: Running task 125.0 in stage 3.0 (TID 259)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_125 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_121 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_121 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_123 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_123 locally
17/10/10 16:35:57 INFO CacheManager: Partition rdd_17_121 not found, computing it
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 ERROR Executor: Exception in task 121.0 in stage 3.0 (TID 256)
org.apache.spark.SparkException: Checkpoint block rdd_17_121 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using `rdd.checkpoint()` or `rdd.localcheckpoint()` instead, which are slower than memory checkpointing but more fault-tolerant.
    at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
    at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
    at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_125 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_125 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_125 locally
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_17_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_17_124 on localhost:43953 in memory (size: 1701.1 KB, free: 2.8 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_11_125 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_11_125 on localhost:43953 in memory (size: 1701.1 KB, free: 4.4 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_15_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_15_124 on localhost:43953 in memory (size: 1701.1 KB, free: 6.1 MB)

bigdatalog.log.zip

ashkapsky commented 7 years ago

Try with more memory and/or less partitions. See the tips at https://github.com/ashkapsky/BigDatalog#configuring-bigdatalog-programs

thomasrebele commented 7 years ago

thanks, setting "spark.sql.shuffle.partitions" to "1" makes it work