first20hours / google-10000-english

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Other
3.93k stars 1.93k forks source link

top 10k english words that are words? #10

Closed tedder closed 8 years ago

tedder commented 8 years ago

Hi, are you interested in having another permutation of the 10k list that is only valid words? I needed that, so I munged the list a little. You probably would want the whole 10k, but it's pretty close to what I ran.

This is relevant to #1.

worldlywisdom commented 8 years ago

Not sure what you're asking here.

tedder commented 8 years ago

What I'm saying is the top 10k includes strings that aren't words that would appear in a dictionary- like "rl", "gl", "dh", "x", "n", and so on. I was linking code to be helpful. Never mind, I guess.