Computational-Content-Analysis-2018 / 19-Jan-Flat-Clustering

Manning, Christopher, Prabhakar Raghavan and Hinrich Schütze. 2008. “Flat Clustering” and “Hierarchical Clustering.” Chapters 16 and 17 from Introduction to Information Retrieval.
https://github.com/Computational-Content-Analysis-2018
0 stars 1 forks source link

Test Problem #7

Open khan1792 opened 6 years ago

khan1792 commented 6 years ago

It seems that clustering methods are difficult to be tested sometimes. For example, there are two corpora. One is similar to Plato's dialogue and another one is typically academic writing style. We can assume that they talk about very similar things and have very similar opinions. However, because of its dialogue style, the the key ideas hide behind many trivial conversations and some trivial things might become the topic and even determine the clustering result. In this case, they are very likely to look different with a good test results in statistics when we use clustering method. If our objective is to know the ideas rather than writing style, how can we ensure the efficiency of clustering methods?