DARIAH-DE / TopicsExplorer

Explore your own text collection with a topic model – without prior knowledge.
https://dariah-de.github.io/TopicsExplorer
Apache License 2.0
62 stars 10 forks source link

Explanation link for 'Hapax Legomena' #48

Closed pielstroem closed 6 years ago

pielstroem commented 6 years ago

People keep asking me what 'hapax legomena' are! Check where we use the term and replace the blank term by a link to an appropriate explanation (e.g. wikipedia).

severinsimmler commented 6 years ago

On the start page there is actually no mention of hapax legomena (because you have no control over it in our workflow anyway, since they are automatically removed). I did this consciously so as not to confuse anyone with wacky foreign words. The results page already contains the following sentence:

In addition so-called hapax legomena have been removed. In corpus linguistics, a hapax legomenon is a word that occurs only once within a context. So, if a word occurs only once in a document, it is very likely that the word is semantically insignificant – that is, not useful for the model.

But I can definitely link to the Wikipedia article.