deep-philology / DeepVocabulary

vocabulary server (mostly for Perseus but also standalone)
https://gu658.us1.eldarioncloud.com
MIT License
3 stars 0 forks source link

consider running totals -- word 1 accounts for 10% of all tokens, words 1-2 15% etc. #68

Open gregorycrane opened 6 years ago

gregorycrane commented 6 years ago

If you have a sorted vocab, show students how quickly they can get to 50% of all vocab (very quickly).

jtauber commented 6 years ago

Yes, another great idea. Although we have to be a little careful what we claim here.

In talks I've given in the past, I've called this the "myth of vocabulary coverage". For example, the top 100 words in the Greek NT cover 66% of tokens. But if instead you ask "how many verses can I read 95% of the tokens of if I know the top 100 words" the answer is only 0.6%.

See:

https://jktauber.com/2007/11/04/gnt-verse-coverage-statistics/

and 11:30 thru 12:45 in my BibleTech 2015 talk https://vimeo.com/127114639