amplab / graphx

Former GraphX development repository. GraphX has been merged into Apache Spark; please submit pull requests there.
https://github.com/apache/spark
Apache License 2.0
360 stars 103 forks source link

Support activeSet option in mapReduceTriplets #100

Closed ankurdave closed 10 years ago

ankurdave commented 10 years ago

Active set tracking (e.g., for Pregel) was previously inelegant. To perform changed-vertex tracking, the user had to change the vertices, then use VertexRDD.deltaJoin (to filter out unchanged vertices) followed by Graph.deltaJoinVertices (to update the graph with the changed vertices while masking out unchanged vertices only in the join view) followed by mapReduceTriplets, where the map UDF needed to check EdgeTriplet.srcMask to see if the source vertex was active before sending messages.

Moreover, Pregel was implemented incorrectly using this abstraction. Vertices whose attributes did not change from one iteration to the next but which continued to receive messages would not get a chance to run, because EdgeTriplet.srcMask reflected the fact that they were unchanged rather than inactive.

Finally, the abstraction did not permit pushing the activeness check into the framework by performing a clustered index scan over the edges.


This PR simplifies the abstraction by removing Graph.deltaJoinVertices and instead adding an optional activeSet parameter to mapReduceTriplets. This option takes a set of vertices with the same index as the graph's vertices and runs the map function only on edges neighboring a vertex in the active set. The direction of neighboring edges to consider can be specified by passing an EdgeDirection.

This is implemented by shipping OpenHashSet[Vid]s with the active vertex ids, and filtering edges in mapReduceTriplets by performing hash lookups in the set. Note that the active vertex id set cannot be represented as a bitmask over the join view, because the vertex index in the join view may not contain all relevant vertices due to join rewrite.

Staleness is still exposed using EdgeTriplet.srcStale and dstStale, but nothing currently uses it.

AmplabJenkins commented 10 years ago

Merged build triggered.

AmplabJenkins commented 10 years ago

Merged build started.

AmplabJenkins commented 10 years ago

Merged build finished.

AmplabJenkins commented 10 years ago

All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4493/

ankurdave commented 10 years ago

Merged with master, updating standalone PageRank to use the new API.

AmplabJenkins commented 10 years ago

Merged build triggered.

AmplabJenkins commented 10 years ago

Merged build started.

AmplabJenkins commented 10 years ago

Merged build triggered.

AmplabJenkins commented 10 years ago

Merged build finished.

AmplabJenkins commented 10 years ago

All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4591/

AmplabJenkins commented 10 years ago

Merged build started.

AmplabJenkins commented 10 years ago

Merged build finished.

AmplabJenkins commented 10 years ago

All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4593/

AmplabJenkins commented 10 years ago

All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4601/

ankurdave commented 10 years ago

Merged master, which now includes #94.

AmplabJenkins commented 10 years ago

All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/5125/