first20hours / google-10000-english

This repo contains a list of the 10,000 most common English words in order of frequency, as determined by n-gram frequency analysis of the Google's Trillion Word Corpus.
Other
3.88k stars 1.93k forks source link

Clearer copyright #21

Closed HubKing closed 3 years ago

HubKing commented 6 years ago

There is a licence file, but that file just shows where the data is originated. I clicked one of the linked and read the licence there. One thing was,

User shall not publish, retransmit, display, redistribute, reproduce or commercially exploit the Data in any form, except that...

So, it means that you can never use this data in a paid app in any way? It may be better to show the usage terms in the licence text.

worldlywisdom commented 6 years ago

Where did you find the quoted section? I'd like to read the whole thing, and can't find the file.

Regardless, I would not use this data in a commercial application without licensing it from the Linguistic Data Consortium, which is linked in the license file.

HubKing commented 6 years ago

I clicked one of the links (https://catalog.ldc.upenn.edu/LDC2006T13) in the licence file, and there was licence(s) on that page, and I clicked it and it showed a document,

User License Agreement for Web 1T 5-gram Version 1
Application by an Organization to use Web 1T 5-gram Version 1 distributed by the
Linguistic Data Consortium (LDC)
______________________________ ("User"), an organization engaging in language education and research agrees to use the text
data designated as Web 1T 5-gram Version 1 (the "Data") and distributed by the LDC subject to the following understandings,
terms and conditions.
1. Permitted and Prohibited Uses
1.1. The Data may only be used for linguistic education and research, including but not limited to information retrieval, document
understanding, machine translation or speech recognition.
1.2. User shall not publish, retransmit, display, redistribute, reproduce or commercially exploit the Data in any form, except that
User may include limited excerpts from the Data in articles, reports and other documents describing the results of User's linguistic
education and research.
2. Copyright Notice and Disclaimer
jviotti commented 3 years ago

I'm facing the same concern. Peter Norvig's website, in which this repo is based as far as I understand, does say:

Code copyright (c) 2008-2009 by Peter Norvig. You are free to use this code under the MIT license.

@first20hours @worldlywisdom Would it be possible to add an open-source friendly LICENSE file to this project?

worldlywisdom commented 3 years ago

@jviotti My understanding is that educational and personal use of this data is permitted under the LDC license, Norvig's MIT license for his contributions, and US fair use doctrine. I'm not going to add a license that goes beyond that. As I mentioned above, I would not use this data for commercial purposes without licensing it from the Linguistic Data Consortium.