cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(FM/MVM, etc) Change data type of margin/multiplier/gradient, etc from Double to Float #26

Open hucheng opened 9 years ago

hucheng commented 9 years ago

Currently, margin/multiplier/gradient RDDs cost a lot of memory space since there is an array with length of (view_rank_sizeof(Double)) for each sample vertex. For instance, given 1B samples, 3 views and 20 ranks, the Margin or Multiplier RDDs would cost 480GB. By changing data type from Double to Float, this can reduce half of the RDD size, with negligible accuracy loss.