cnshot / CNShot

@cnshot Twitter bot
https://twitter.com/cnshot
1 stars 0 forks source link

similarity_matrix covers the beginning duplicated_check_limit tweet only #7

Open cnshot opened 13 years ago

cnshot commented 13 years ago

With following code, similarity detecting was applied to the begging duplicated_check_limit items of new tweets only, duplicated tweets of low rate may be leaked.

    similarity_matrix = word_freq.hash_filter_knowns(scipy.array(cluster_rs[:cfg.cluster_tweets.duplicated_check_limit]), #@UndefinedVariable
                                                     scipy.array(rt_rs[:cfg.cluster_tweets.duplicated_check_limit]), #@UndefinedVariable
                                                     similarity_threshold=cfg.cluster_tweets.similarity_threshold)