cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(LDA): aliasTable opts #34

Open hucheng opened 9 years ago

hucheng commented 9 years ago

Several opportunities in optimizing aliasTable:

  1. change probability type from Double to Float, save space
  2. unnecessary to use a JPriorityQueue, which is cost introduced by sort. Just a simple queue is enough.
bhoppi commented 9 years ago

now the time complexity of AliasTable construction is O(n), no sorting needed.

bhoppi commented 9 years ago

I changed the AliasTable prob type from Double to Float, then again to generics (it can also be Int or Long too). The Int prob AliasTable is used in BBR Partitioner.

hucheng commented 9 years ago

Great. If int-typed probability would affect the sampling precision?

bhoppi commented 8 years ago

No, it wouldn't.