DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!
Apache License 2.0
1.47k stars 166 forks source link

License clarification for dataset and weights #174

Closed TechInterMezzo closed 3 months ago

TechInterMezzo commented 5 months ago

Is the latest model trained on the BibleMMS dataset which in turn was created with the pretrained MMS TTS model?

If the answer is yes, what would that mean for the license of this TTS model's weights? The MMS model has a non commercial license which I guess would make the BibleMMS dataset and everything that was trained on it also forbidden for commercial use and should be reflected in the license.

Or did I get something wrong?

Flux9665 commented 5 months ago

I'm not sure about the details, but from what I have been told, the license of MMS can only apply to its model weights. For anything that I generate with the model on my machine, I have all the rights. So the audio portion of the dataset should not be a problem, because the outputs of generative models are not under the license of the model.

The only problem could be the license of the text, since we take them as they are. But to that end we don't take all the texts from the BibleNLP dataset, but only the ones for which we found either Apache or MIT license notices. So if my understanding of copyright and licenses as they are in effect here in Germany is correct, the choice of license should be fine.

TechInterMezzo commented 5 months ago

Thank you for the answer and this great project. I don't know how viral the non commercial clause of Creative Commons is. So I am hesitant to use datasets or transfer learning in that regard.

You can close this issue if you want. I will see if I can contact someone from Meta AI to be on the safe side. If I can get any confirmation I will write here again.