Closed ankurdave closed 10 years ago
Merged build triggered.
Merged build started.
Merged build finished.
All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4493/
Merged with master, updating standalone PageRank to use the new API.
Merged build triggered.
Merged build started.
Merged build triggered.
Merged build finished.
All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4591/
Merged build started.
Merged build finished.
All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4593/
All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/4601/
Merged master, which now includes #94.
All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/GraphXPullRequestBuilder/5125/
Active set tracking (e.g., for Pregel) was previously inelegant. To perform changed-vertex tracking, the user had to change the vertices, then use VertexRDD.deltaJoin (to filter out unchanged vertices) followed by Graph.deltaJoinVertices (to update the graph with the changed vertices while masking out unchanged vertices only in the join view) followed by mapReduceTriplets, where the map UDF needed to check EdgeTriplet.srcMask to see if the source vertex was active before sending messages.
Moreover, Pregel was implemented incorrectly using this abstraction. Vertices whose attributes did not change from one iteration to the next but which continued to receive messages would not get a chance to run, because EdgeTriplet.srcMask reflected the fact that they were unchanged rather than inactive.
Finally, the abstraction did not permit pushing the activeness check into the framework by performing a clustered index scan over the edges.
This PR simplifies the abstraction by removing Graph.deltaJoinVertices and instead adding an optional activeSet parameter to mapReduceTriplets. This option takes a set of vertices with the same index as the graph's vertices and runs the map function only on edges neighboring a vertex in the active set. The direction of neighboring edges to consider can be specified by passing an EdgeDirection.
This is implemented by shipping OpenHashSet[Vid]s with the active vertex ids, and filtering edges in mapReduceTriplets by performing hash lookups in the set. Note that the active vertex id set cannot be represented as a bitmask over the join view, because the vertex index in the join view may not contain all relevant vertices due to join rewrite.
Staleness is still exposed using EdgeTriplet.srcStale and dstStale, but nothing currently uses it.