cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(GraphX): shipVertexAttributes is costly #32

Open hucheng opened 9 years ago

hucheng commented 9 years ago
  1. Master vertex would ship the updated attribute to all slaves. Consider a case that there are multiple partitions (slaves) in a machine, it is unnecessary to ship multiple times to that machine but once.
  2. shipVertexAttributes will be called twice in LDA, one is at MapTripplet in sampleToken to ship attribute from master to slaves, another is joinVertices in updateCounter that ships attribute from vertices to edges. The thing is that it is unnecessary to copy thus ship the attribute, but keep a hash map (vid -> local vertex index).
bhoppi commented 9 years ago

The 1st issue can't be solved right now. The 2nd issue is solved because we don't use GraphX's APIs now.