DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Apache License 2.0
1.12k stars 131 forks source link

License clarification for dataset and weights #174

Open TechInterMezzo opened 1 week ago

TechInterMezzo commented 1 week ago

Is the latest model trained on the BibleMMS dataset which in turn was created with the pretrained MMS TTS model?

If the answer is yes, what would that mean for the license of this TTS model's weights? The MMS model has a non commercial license which I guess would make the BibleMMS dataset and everything that was trained on it also forbidden for commercial use and should be reflected in the license.

Or did I get something wrong?

Flux9665 commented 1 week ago

I'm not sure about the details, but from what I have been told, the license of MMS can only apply to its model weights. For anything that I generate with the model on my machine, I have all the rights. So the audio portion of the dataset should not be a problem, because the outputs of generative models are not under the license of the model.

The only problem could be the license of the text, since we take them as they are. But to that end we don't take all the texts from the BibleNLP dataset, but only the ones for which we found either Apache or MIT license notices. So if my understanding of copyright and licenses as they are in effect here in Germany is correct, the choice of license should be fine.

TechInterMezzo commented 1 week ago

Thank you for the answer and this great project. I don't know how viral the non commercial clause of Creative Commons is. So I am hesitant to use datasets or transfer learning in that regard.

You can close this issue if you want. I will see if I can contact someone from Meta AI to be on the safe side. If I can get any confirmation I will write here again.