bmeaut / python_nlp_2018_spring

MIT License
8 stars 10 forks source link

Finding hapax legomenon #11

Closed kkb2 closed 6 years ago

kkb2 commented 6 years ago

I'm a little confused for finding hapax words since there are no words in the text that appear just once. I understand that Zipf's law means that the the frequency of any word is inversely proportional to its rank. However, I'm not sure at what point (rank) it would be close enough to zero that it should be deemed a hapax. I also admit that I could be going about this the wrong way. Any suggestions?

juditacs commented 6 years ago

You must have a mistake in your frequency counting algorithm. There are plenty of words that appear only once..

kkb2 commented 6 years ago

Ok thank you! However, just to clarify we only get rid of the words with frequencies equal to 1?

kkb2 commented 6 years ago

Yes I just realized I was adding one to the frequency when I added the word to the dictionary and when I updated it later on.