first20hours / google-10000-english

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Other
3.88k stars 1.93k forks source link

Is there a Spanish version? #15

Open BayInternetGroup opened 6 years ago

BayInternetGroup commented 6 years ago

This is very interesting and nice work. I have been searching for a list similar to this in Spanish and other languages for some time now without any luck. Do you know where I could find such a list or at least the data sets to create one? Thanks!

ardaegeunlu commented 6 years ago

Any luck with other languages?

BayInternetGroup commented 6 years ago

No luck so far. I did find some but they were not organized by popularity or if they were they were extremely outdated. If you come across anything please let me know. Thanks :)

ardaegeunlu commented 6 years ago

You might want to check this out.

BayInternetGroup commented 6 years ago

Thanks, I have come across these before however I am looking for online search word frequency. The movie subtitle data does not align very well with search data. For example from the 2016 movie data the word "headset" is third lowest used word however this word would place quite high within search data word frequency. Thanks again for your help! 👍