first20hours / google-10000-english

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.

Other

3.88k stars 1.93k forks source link

Replace the last half of 20k.txt using count_1w.txt #6 #9

Closed koseki closed 8 years ago

koseki commented 8 years ago

6

curl http://norvig.com/ngrams/count_1w.txt | head -n 20000 | sed 's/        .*//' > 20k.txt

worldlywisdom commented 8 years ago

Great catch - not sure why the the original source has duplicates. I appreciate the fix.