cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

(LDA): F+ tree for sampling #33

Open hucheng opened 9 years ago

hucheng commented 9 years ago

F+ tree (see paper http://www.cs.utexas.edu/~rofuyu/papers/nomad-lda-www.pdf) can get more fresh state than aliasTable, and with lower initialization cost (log(k)), but with higher sample cost (log(k)).

This is good for LDA model inference?

bhoppi commented 9 years ago

We've already implement F+ tree, but not tested yet.