explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.08k stars 4.4k forks source link

In which license falls a model trained using spacy's LGPL models ? #7216

Closed sylentheal closed 3 years ago

sylentheal commented 3 years ago

Hi everyone,

I'm not a jurist and would definitelly not want to do anything wrong with using spacy :)

I noticed that the model used for french language is under LGPL-LR license.

I'm wondering why, considering the fact that (section 3 of LGPL-LR) :

"A program that contains no derivative of any portion of the Linguistic Resource, but is designed to work with the Linguistic Resource (or an encrypted form of the Linguistic Resource) by reading it or being compiled or linked with it, is called a "work that uses the Linguistic Resource". Such a work, in isolation, is not a derivative work of the Linguistic Resource, and therefore falls outside the scope of this License."

1 - Would it be possible to release these model under a MIT-Like license instead ? 2 - Would a model trained using spaCy's pre-trained model be "contaminated" by the LGPL license ?

All the best, and thank you in advance for your reply.

adrianeboyd commented 3 years ago

We are not in a position to give legal advice, but we do consider the models to be derivative works of the training corpora and our policy is to license our models under the most restrictive license of all the training corpora used. Some licenses are incompatible with each other, so we don't combine incompatible resources in the same model package (e.g., GPLv3 + certain CC licenses).

I'm not 100% sure what you mean by "a model trained using a pretrained model", but if you fine-tune fr_core_news_sm with additional data, the resulting model would still fall under the original license. Sometimes one of the components (like French NER, which is trained on WikiNER) could be distributed under a more permissive license if you train a pipeline separately with only that data.

Our best advice would be to ask your legal team and to potentially get in touch with the corpus authors to clarify based on your intended use.

github-actions[bot] commented 3 years ago

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.