Closed kkb2 closed 6 years ago
You must have a mistake in your frequency counting algorithm. There are plenty of words that appear only once..
Ok thank you! However, just to clarify we only get rid of the words with frequencies equal to 1?
Yes I just realized I was adding one to the frequency when I added the word to the dictionary and when I updated it later on.
I'm a little confused for finding hapax words since there are no words in the text that appear just once. I understand that Zipf's law means that the the frequency of any word is inversely proportional to its rank. However, I'm not sure at what point (rank) it would be close enough to zero that it should be deemed a hapax. I also admit that I could be going about this the wrong way. Any suggestions?