barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
694 stars 101 forks source link

What is license for Language data? #100

Open madkote opened 3 years ago

madkote commented 3 years ago

Hi

What is license for Language data? MIT, Apache, ...? Very important to know, if one can use it in a commercial application.

Thanks!

barrust commented 3 years ago

The data was orginially pulled from opensubtitles.org but was heavily modified and the dictionary itself is part of this code. I did not pull the dictionaries directly from any other project. Then again, I am not a lawyer. I gave credit to the original source of the text used to build the dictionaries in the README and the script that pulled and parsed the data for the dictionary builds.

madkote commented 3 years ago

@barrust thanks for reply. I am not sure, what kind of license is it - kind of hard to find out on their homepage.

But the if the license is not for commercial use, then it must be clearly stay in the README here. Not knowing the law does not free from responsibility 8))

Anyhow, please keep the issue open - I will try to find out the license.

can you list here, which data you have used and modified? it will make it simpler.

barrust commented 3 years ago

The scripts/build_dictionary.py script lists each item used but the data can be found here: https://opus.nlpl.eu/OpenSubtitles2018.php

Per this page, the requirements to use are to: 1) Add the url to http://www.opensubtitles.org/ 2) Please cite the following article if you use any part of the corpus in your own work: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)

I have had both items on the README but if this isn't enough, another data source could be found.