Lazy evaluation of join and map operations

The VertexSetRDD[VD] stores the vertex attributes as an IndexedSeq[VD]. When a VertexSetRDD is first constructed from an RDD[(Vid,VD)] the attributes are stored in an Array[VD]. When mapValues is in invoked on a VertexSetRDD[VD] a new array is created and populated with the result of the map operation.

https://github.com/amplab/graphx/blob/master/graph/src/main/scala/org/apache/spark/graph/VertexSetRDD.scala#L129

However when leftJoin is invoked an IndexedSeqView is created:

https://github.com/amplab/graphx/blob/master/graph/src/main/scala/org/apache/spark/graph/VertexSetRDD.scala#L192

Should both be implemented using views or should both be implemented using actual storage. The tradeoffs are the following:

Using views means that long chains of computation might be invoked repeatedly.
Using Arrays could lead to many long-lived allocations.

I suspect all the operations should be implemented using the view but I am not sure what the implications are for caching.

amplab / graphx

Lazy evaluation of join and map operations #32