dwyl / english-words

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
The Unlicense
10.54k stars 1.83k forks source link

Adequacy of the list #30

Open ghost opened 6 years ago

ghost commented 6 years ago

How adequate is this list to perform a letter frequency analysis? My only concern is the number of words . Can you provide me with a relative size comparison?

a-raccoon commented 6 years ago

It depends on the purpose of your letter frequency analysis. This list attempts to contain all words and their known prefixes and suffixes, so the same root words are represented numerous times. As such, verbs are going to be repeated more often than nouns, and the letters "ed" and "ing' more frequently as a result of their popularity as suffixes. Depending on how you phrase the conclusions of your analysis, the results will be skewed in this favor.