Unclear license - Githubissues

Good question.

The original list with word frequencies was published here: http://norvig.com/ngrams/. On that page, Norvig states:

Code copyright (c) 2008-2009 by Peter Norvig. You are free to use this code under the MIT license.

It's not clear to me whether or not the word lists count as "code."

The full original 1T 5-gram corpus is distributed by the Linguistic Data Consortium here: https://catalog.ldc.upenn.edu/LDC2006T13.

The LDC has a license which allows for "limited excerpts from the Data" for "linguistic education and research," which appears to make Norvig's use (and by extension, this repo) acceptable for non-commercial purposes.

Bottom line: if you intend to use this for commercial purposes, I'd recommend getting a license from the LDC for the full corpus. Personal non-commercial use should be okay.

first20hours / google-10000-english

Unclear license #11