SLU-TMI / TextMining.jl

Other
24 stars 7 forks source link

K Means Cluster Visual #49

Open ljekersey opened 9 years ago

ljekersey commented 9 years ago

Less Urgent

Can we have a tool that easily represents each text as a data point and visually demonstrates that each of those data points belongs to a cluster? It would be especially cool if this tool could be connected to the metadata. So, could I color each data point according to categorical variables (in this case one of the text's metadata categories--author, genre, time period).

Maybe look at similar SPSS tools

Kevin-Damazyn commented 9 years ago

I don't fully understand what you are asking. Do you want a spider graph? Not to shoot this down but if a spider graph is what you want, that would be very graphical intensive. So no promises but if you could clarify what it is you wanted exactly, that would be great. image

ljekersey commented 9 years ago

Ignoring the graphics demands for now, I think a spider graph sounds really cool! So okay, each spoke could represent a different period of time (like a decade or a year), and the colored lines could each represent a cluster. So, the data value for each spoke could be the number of texts in that cluster which were published in that given time period. This would allow us to visualize how consistent our clusters are. So we would hope to see clusters that are not so evenly distributed amongst all of the spokes?

And if we're not seeing time period in our clusters, then we can change the spokes to be different genres (just to visualize whether or not those are driving the clusters). Does this make any sense?

Kevin-Damazyn commented 9 years ago

@ljekersey OK, I think I understand better what you want. Like @mtabor150 said, I think our main focus is to move on with what we currently have (for now) and get the Bayes classifier done first and then see where we are at in the semester. So again, it's not a no per-say, but it's also not a go at this time.

ljekersey commented 9 years ago

That's fair. I would agree that visualization is definitely less important than setting up the actual data mining and machine learning algorithms.