cegme / gatordsr

University of Florida Trec KBA code and more
3 stars 0 forks source link

Word Frequency/Count #26

Open mshahriarinia opened 11 years ago

mshahriarinia commented 11 years ago

We need to have a measure of the word-meaning frequency/counts for proper assignment to entity or slots. For example the words with multiple meaning with the same POS, have a certain probability of being used with each meaning. We need to recognize this to order our matches. Wordnet lemma.tagcount has something like this but is not complete and is full of zeros we need to another measure.

Refer: http://stackoverflow.com/questions/12943193/how-to-measure-wordnet-term-frequency-values-and-cooccurence-value-programatical http://stackoverflow.com/questions/5928704/how-do-i-find-the-frequency-count-of-a-word-in-english-using-wordnet

We might be able to use http://corpus2.byu.edu/coca/100k_data.asp?query=1 http://americannationalcorpus.org/OANC/index.html#download http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html

cegme commented 11 years ago

Computing word co-occurrence is something we could do. And we could add it as part of our training set.

c(word1, word2)

It is typical to use things like the links you listed. Don't let these word counts stop you from proceeding. There will be some inaccuracy to begin with we just need to record these assumptions and continue.