(LDA): different terminate condition for different vertices.

The insight is that the convergence speed of topics of some edges or word-topic distribution of some words is different, some converge earlier. For those converged edges/words, it is unnecessary to add them in the working set in the next iteration. The thing is how to determine an edge/word converge or not. A feasible solution is to use bhattacharyya coefficient (https://en.wikipedia.org/wiki/Bhattacharyya_distance) to compare the word-topic similarity of two consecutive iterations. The more similar, the more probability that that word is converged. We do not simply filter out the converged words based on a threshold value, instead, we use a probability to sample the edges of that word, the sample probability is negative-proportional to the similarity degree, and we also consider the time factor that the longer that an edge is not sampled, the new sample probability would be higher.

cloudml / zen

(LDA): different terminate condition for different vertices. #31