amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

GraphX triplets: only Eds and Frans :( #144

Open k0ala opened 10 years ago

k0ala commented 10 years ago

I followed the GraphX tutorial at http://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html

on a local stand-alone cluster (Spark version 0.9.0) with two workers. Somehow, the graph.triplets is not returning what it should.

scala> graph.vertices.toArray
14/03/04 16:16:18 INFO SparkContext: Starting job: toArray at <console>:27
14/03/04 16:16:18 INFO DAGScheduler: Got job 6 (toArray at <console>:27) with 1 output partitions (allowLocal=false)
14/03/04 16:16:18 INFO DAGScheduler: Final stage: Stage 28 (toArray at <console>:27)
14/03/04 16:16:18 INFO DAGScheduler: Parents of final stage: List(Stage 32, Stage 29)
14/03/04 16:16:18 INFO DAGScheduler: Missing parents: List()
14/03/04 16:16:18 INFO DAGScheduler: Submitting Stage 28 (VertexRDD[15] at RDD at VertexRDD.scala:52), which has no missing parents
14/03/04 16:16:18 INFO DAGScheduler: Submitting 1 missing tasks from Stage 28 (VertexRDD[15] at RDD at VertexRDD.scala:52)
14/03/04 16:16:18 INFO TaskSchedulerImpl: Adding task set 28.0 with 1 tasks
14/03/04 16:16:18 INFO TaskSetManager: Starting task 28.0:0 as TID 12 on executor localhost: localhost (PROCESS_LOCAL)
14/03/04 16:16:18 INFO TaskSetManager: Serialized task 28.0:0 as 2426 bytes in 0 ms
14/03/04 16:16:18 INFO Executor: Running task ID 12
14/03/04 16:16:18 INFO BlockManager: Found block rdd_14_0 locally
14/03/04 16:16:18 INFO Executor: Serialized size of result for 12 is 947
14/03/04 16:16:18 INFO Executor: Sending result for 12 directly to driver
14/03/04 16:16:18 INFO Executor: Finished task ID 12
14/03/04 16:16:18 INFO TaskSetManager: Finished TID 12 in 13 ms on localhost (progress: 0/1)
14/03/04 16:16:18 INFO DAGScheduler: Completed ResultTask(28, 0)
14/03/04 16:16:18 INFO TaskSchedulerImpl: Remove TaskSet 28.0 from pool
14/03/04 16:16:18 INFO DAGScheduler: Stage 28 (toArray at <console>:27) finished in 0.015 s
14/03/04 16:16:18 INFO SparkContext: Job finished: toArray at <console>:27, took 0.027839851 s
res9: Array[(org.apache.spark.graphx.VertexId, (String, Int))] = Array((4,(David,42)), (2,(Bob,27)), (6,(Fran,50)), (5,(Ed,55)), (3,(Charlie,65)), (1,(Alice,28)))

scala> graph.edges.toArray
14/03/04 16:15:57 INFO SparkContext: Starting job: collect at EdgeRDD.scala:51
14/03/04 16:15:57 INFO DAGScheduler: Got job 5 (collect at EdgeRDD.scala:51) with 1 output partitions (allowLocal=false)
14/03/04 16:15:57 INFO DAGScheduler: Final stage: Stage 27 (collect at EdgeRDD.scala:51)
14/03/04 16:15:57 INFO DAGScheduler: Parents of final stage: List()
14/03/04 16:15:57 INFO DAGScheduler: Missing parents: List()
14/03/04 16:15:57 INFO DAGScheduler: Submitting Stage 27 (MappedRDD[36] at map at EdgeRDD.scala:51), which has no missing parents
14/03/04 16:15:57 INFO DAGScheduler: Submitting 1 missing tasks from Stage 27 (MappedRDD[36] at map at EdgeRDD.scala:51)
14/03/04 16:15:57 INFO TaskSchedulerImpl: Adding task set 27.0 with 1 tasks
14/03/04 16:15:57 INFO TaskSetManager: Starting task 27.0:0 as TID 11 on executor localhost: localhost (PROCESS_LOCAL)
14/03/04 16:15:57 INFO TaskSetManager: Serialized task 27.0:0 as 2068 bytes in 1 ms
14/03/04 16:15:57 INFO Executor: Running task ID 11
14/03/04 16:15:57 INFO BlockManager: Found block rdd_2_0 locally
14/03/04 16:15:57 INFO Executor: Serialized size of result for 11 is 936
14/03/04 16:15:57 INFO Executor: Sending result for 11 directly to driver
14/03/04 16:15:57 INFO Executor: Finished task ID 11
14/03/04 16:15:57 INFO TaskSetManager: Finished TID 11 in 13 ms on localhost (progress: 0/1)
14/03/04 16:15:57 INFO DAGScheduler: Completed ResultTask(27, 0)
14/03/04 16:15:57 INFO TaskSchedulerImpl: Remove TaskSet 27.0 from pool
14/03/04 16:15:57 INFO DAGScheduler: Stage 27 (collect at EdgeRDD.scala:51) finished in 0.015 s
14/03/04 16:15:57 INFO SparkContext: Job finished: collect at EdgeRDD.scala:51, took 0.023602266 s
res7: Array[org.apache.spark.graphx.Edge[Int]] = Array(Edge(2,1,7), Edge(2,4,2), Edge(3,2,4), Edge(3,6,3), Edge(4,1,1), Edge(5,2,2), Edge(5,3,8), Edge(5,6,3))

scala> graph.triplets.toArray
14/03/04 16:16:30 INFO SparkContext: Starting job: toArray at <console>:27
14/03/04 16:16:30 INFO DAGScheduler: Got job 7 (toArray at <console>:27) with 1 output partitions (allowLocal=false)
14/03/04 16:16:31 INFO DAGScheduler: Final stage: Stage 33 (toArray at <console>:27)
14/03/04 16:16:31 INFO DAGScheduler: Parents of final stage: List(Stage 34)
14/03/04 16:16:31 INFO DAGScheduler: Missing parents: List()
14/03/04 16:16:31 INFO DAGScheduler: Submitting Stage 33 (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60), which has no missing parents
14/03/04 16:16:31 INFO DAGScheduler: Submitting 1 missing tasks from Stage 33 (ZippedPartitionsRDD2[32] at zipPartitions at GraphImpl.scala:60)
14/03/04 16:16:31 INFO TaskSchedulerImpl: Adding task set 33.0 with 1 tasks
14/03/04 16:16:31 INFO TaskSetManager: Starting task 33.0:0 as TID 13 on executor localhost: localhost (PROCESS_LOCAL)
14/03/04 16:16:31 INFO TaskSetManager: Serialized task 33.0:0 as 3322 bytes in 1 ms
14/03/04 16:16:31 INFO Executor: Running task ID 13
14/03/04 16:16:31 INFO BlockManager: Found block rdd_2_0 locally
14/03/04 16:16:31 INFO BlockManager: Found block rdd_31_0 locally
14/03/04 16:16:31 INFO Executor: Serialized size of result for 13 is 931
14/03/04 16:16:31 INFO Executor: Sending result for 13 directly to driver
14/03/04 16:16:31 INFO Executor: Finished task ID 13
14/03/04 16:16:31 INFO TaskSetManager: Finished TID 13 in 17 ms on localhost (progress: 0/1)
14/03/04 16:16:31 INFO DAGScheduler: Completed ResultTask(33, 0)
14/03/04 16:16:31 INFO TaskSchedulerImpl: Remove TaskSet 33.0 from pool
14/03/04 16:16:31 INFO DAGScheduler: Stage 33 (toArray at <console>:27) finished in 0.019 s
14/03/04 16:16:31 INFO SparkContext: Job finished: toArray at <console>:27, took 0.037909394 s
res10: Array[org.apache.spark.graphx.EdgeTriplet[(String, Int),Int]] = Array(((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3), ((5,(Ed,55)),(6,(Fran,50)),3))