cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

LogisticRegression's MIS algorithm implementation with vectors #7

Open Peishen-Jia opened 9 years ago

Peishen-Jia commented 9 years ago

We are wondering the performance of MIS with GraphX or with vectors. I implemented a version of Logistic Regression with vectors. Will compare them in the following days.

For now, I didn't implement regularization , data stand scale and multiclass issues.

witgo commented 9 years ago

MIS 不支持intercept吧? 多分类应该是支持的, 在论文Logistic Regression, AdaBoost and Bregman Distances- 8 Multiclass problems有详细描述,不过没有论证在MIS上如何实现.

augusterodin commented 9 years ago
  1. reduceByKey 需要改成tree style的,否则内存消耗很大,速度也慢。
  2. 不需要做两个RDD的zip,完全可以直接算出scaledFeatures。
  3. 没有L1?
  4. 另外,这么写没有利用上Local Accumulator的优势,跟之前的版本没法比。建议Peishen在现有LBFGS的aggregate基础上改,因为其实是一样的,都是做vector的aggregate。