Open drJAGartner opened 7 years ago
@drJAGartner can u explain. maybe some examples.
Pointwise mutual information is a measure used in information theory to describe how closely related word pairs are: https://en.wikipedia.org/wiki/Pointwise_mutual_information
If you look at the Applications portion, you can see what this looks like, that words that almost always appear together (i.e. Puerto & Rico) have high scores. If we find a pair of words that change in their occurrence (i.e. pray-paris, irish-water, crane-hadge), it would be a good way of identifying unique speech patterns.
sounds so fancy, yet so simple.
As an alternative to the current method of non-hashtag sentiment clustering, we can try to perform pointwise mutual information scores on word bigrams. For non-stopwords, we can assess what the pointwise mutual information is. Similarly to how we create hashtag clustering, we can assess the likelihood of creating such a high PMI score, and from there choose to include it in our graph.