first20hours / google-10000-english

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Other
3.93k stars 1.93k forks source link

Licence #4

Closed hugovk closed 9 years ago

hugovk commented 9 years ago

What is the licence for this data?

https://github.com/first20hours/google-10000-english/blob/master/LICENSE.md gives the provenance but not the actual licence.

worldlywisdom commented 9 years ago

I’d recommend verifying that with the Linguistic Data Consortium:

https://catalog.ldc.upenn.edu/LDC2006T13

Here’s the copyright statement:

"Portions © 2006 Google Inc., © 2006 Trustees of the University of Pennsylvania”

I’m not sure what “portions” means, so I’d recommend checking.

Peter Norvig’s site lists his contributions as MIT, and mine are as well.

Josh Kaufman

On December 10, 2014 at 1:16:49 PM, Hugo (notifications@github.com) wrote:

What is the licence for this data?

https://github.com/first20hours/google-10000-english/blob/master/LICENSE.md gives the provenance but not the actual licence.

— Reply to this email directly or view it on GitHub.

hugovk commented 9 years ago

That LDC page includes a link to the licence file:

License(s): Web 1T 5-gram Version 1 Agreement

Thanks!