Closed miktoki closed 5 years ago
The standard contract linked above is in English, so the question of an agreement in Norwegian is not really relevant.
The use for NoWaC in spaCy would be pre-training to improve the models. The results using a smaller corpus (not NoWaC, which is much larger) were quite promising.
spaCy doesn't need to distribute the NoWaC corpus, so in the short-term the question is really whether you guys feel the license you have enables you to issue us an MIT license to the vectors you've trained. I have nb_core_web_sm
and nb_core_web_md
models trained without the pretraining, scheduled for release with v2.2.
We'd like to replicate the vectors and add pretraining to the pipeline, so I guess for that we'll need a license to the corpora (or to use equivalent ones instead). But for the immediate release it shouldn't be necessary.
The word vectors I used to build my nb models already have an acceptable CC license.
The agreement suggested in this issue is only relevant if we want to improve the models by pre-training with the NoWaC corpus.
@jarib Sounds good! The standard contract looks fine to me. Once we're to wire up the pretraining, we can discuss the details over email and get this signed 🙂
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Referring to #3082, I have been in contact with the legal owner of noWaC, the dataset we wish to base a norwegian spaCy model on together with @jarib. They are unable to change the license of the corpus itself due to norwegian law (se explanation in section 1.2 in the noWaC paper), but are willing to set up some special agreement which would allow us to use the corpus to generate the language model. They have a standard contract available, where we could set the cost of the license to 0, and specific terms specified in Appendix 2. I also think we should further discuss the duration of the contract with them.
As pointed out by @ines here, the language models could count as derivative works based on the corpus.
@honnibal @ines, what do you need the terms of the contract to include? Also, is it sufficient to have the agreement in Norwegian? Do you perhaps have a template we could base the contract no, or alternatively, if their contract is used, what should be included in Appendix 2?