cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(GraphX): better partitioning strategies #39

Closed hucheng closed 8 years ago

hucheng commented 9 years ago

There are four partitioning strategies in GraphX:

  1. random hash
  2. edge1D (src or dst)
  3. edgePartition2D

Besides, we also implemented:

  1. DBH (Degree-Based Hashing)
  2. balanced label propagation from Facebook (http://stanford.edu/~jugander/papers/wsdm13-blp.pdf and https://code.facebook.com/posts/274771932683700/large-scale-graph-partitioning-with-apache-giraph/)
  3. Bounded and Balanced Partitioner (two stages, edges belongs to vertex partition that has larger degree, and a re-balanced partitioner, details later. )
bhoppi commented 9 years ago

This issue is duplicated to https://github.com/cloudml/zen/issues/38