SLU-TMI / TextMining.jl

Other
24 stars 7 forks source link

Density-Based Clustering #72

Open ljekersey opened 9 years ago

ljekersey commented 9 years ago

Okay, I'm just throwing this one out there. But instead of hierarchical clustering, I think it would be cool to think about using a density-based clustering model. I like the idea of using an additional measure of similarity besides distance (just to see if they tell us different things), and if I understand this correctly, density measures the number of elements in the document-term-matrix (corpus) which are frequency zero divided by the total number of elements in the corpus. Effectively, we wouldn't be assuming that word frequencies and ratios matter. We would be grouping texts together the basis that they share a more similar set of words, regardless of whether they use those words in different ratios.

mtabor150 commented 9 years ago

@Kevin-Damazyn and I were talking about implementing OPTICS. At the moment though I don't think this is a critical need and should be pushed to the back burner until we get everything done for naive bayes. Also, hierarchical is almost done so we wouldn't be replacing it with something else. Just adding more functionality.

ljekersey commented 9 years ago

Cool. I'll look into Optics. I agree though--this is back burner.

Kevin-Damazyn commented 9 years ago

I am all for trying to get this in before the end of the semester. I like the idea of density based clustering. I unfortunately didn't get a chance to spike it out over spring break, but again this is a timing thing.