Closed pielstroem closed 6 years ago
On the start page there is actually no mention of hapax legomena (because you have no control over it in our workflow anyway, since they are automatically removed). I did this consciously so as not to confuse anyone with wacky foreign words. The results page already contains the following sentence:
In addition so-called hapax legomena have been removed. In corpus linguistics, a hapax legomenon is a word that occurs only once within a context. So, if a word occurs only once in a document, it is very likely that the word is semantically insignificant – that is, not useful for the model.
But I can definitely link to the Wikipedia article.
People keep asking me what 'hapax legomena' are! Check where we use the term and replace the blank term by a link to an appropriate explanation (e.g. wikipedia).