cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(FM/MVM, etc.): GraphX limitations #36

Open hucheng opened 9 years ago

hucheng commented 9 years ago

The typical flow of FM/MVM on GraphX is as follows:

  val margin = forward(iter)
  var (_, rmse, gradient) = backward(margin, iter) {
       multi = multiplier(q, iter)
  }
  gradient = updateGradientSum(gradient, iter)
  vertices = updateWeight(gradient, iter)

Logically, margin and multiplier are all temporary variable that all samples in a partition can share them since sample is executed sequentially one by one, thus only one margin/multiplier copy exists. However, in current GraphX implementation, there are many (the number of samples) margin/multiplier variables. This will cost huge memory thus affect the scalability.