Open cnshot opened 13 years ago
With following code, similarity detecting was applied to the begging duplicated_check_limit items of new tweets only, duplicated tweets of low rate may be leaked.
similarity_matrix = word_freq.hash_filter_knowns(scipy.array(cluster_rs[:cfg.cluster_tweets.duplicated_check_limit]), #@UndefinedVariable scipy.array(rt_rs[:cfg.cluster_tweets.duplicated_check_limit]), #@UndefinedVariable similarity_threshold=cfg.cluster_tweets.similarity_threshold)
With following code, similarity detecting was applied to the begging duplicated_check_limit items of new tweets only, duplicated tweets of low rate may be leaked.