Manning, Christopher, Prabhakar Raghavan and Hinrich Schütze. 2008. “Flat Clustering” and “Hierarchical Clustering.” Chapters 16 and 17 from Introduction to Information Retrieval.
I don't think the chapters go into detail on what the unit of analysis is, but I'm assuming it's a word. So, would clustering based on collocations (groups of words that have meaning together) lead to significantly different clusters? Is there a way to capture the relationships between words (instead of just presence or frequency) in a way that is informative to clustering?
I don't think the chapters go into detail on what the unit of analysis is, but I'm assuming it's a word. So, would clustering based on collocations (groups of words that have meaning together) lead to significantly different clusters? Is there a way to capture the relationships between words (instead of just presence or frequency) in a way that is informative to clustering?