cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(Graphx) Upstream necessary changes to graphx #58

Open benmccann opened 8 years ago

benmccann commented 8 years ago

I'd like to make any changes necessary to graphx to use the upstream library. This repo is still on graphx 1.4 I believe and so we're not getting any of the graphx bug fixes.

I sent https://github.com/cloudml/zen/pull/56 and https://github.com/cloudml/zen/pull/57 to reduce the diff between graphx2 and upstream

I sent https://github.com/apache/spark/pull/14291 to upstream addition of a new method

Changes left:

bhoppi commented 8 years ago

Thanks mate! The change is not that necessary. I just want to reduce an unneeded rdd.mapPartitions operation :-). Because the only thing that GraphImpl.apply (line 341-354) is more than GraphImpl.fromExistingRDDs is that GraphImpl.apply calls EdgePartition.withoutVertexAttributes to generate a new empty vertex attributes array which will not be used at all.