DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Apache License 2.0
1.17k stars 135 forks source link

Porting model(s) to mobile (Android) #160

Closed showgan closed 2 weeks ago

showgan commented 9 months ago

Hi, I am able to finetune the Meta model to a new language (Adyga) achieving a reasonable performance. I'm considering porting the model to Android using the "optimize_for_mobile" method from PyTorch.

I suppose I'll need to port all the models: the main model, the vocoder model, the aligner model, and the embedding, right?

I'll also need to re-implement the pre-processing (text to phones etc.) and post-processing (convert to wav) in Java.

Do you have plans to implement features in this direction? or to add how-to documentation etc.?

I'm not sure what inference performance I'll be getting on a phone, but I would like to give it a try.

Thanks, Haroon

Flux9665 commented 9 months ago

Hi Haroon! Glad to hear you have a working system for Adyga.

Just for inference, you don't need the embedding model and the aligner model, you only need the phoneme-to-spectrogram and spectrogram-to-audio model. The embedding model is only needed if you want to switch the voice, but you could also just save some presets. I have no plans regarding Java support or optimizing for mobile use. My priority list right now is enhancing the multispeaker capabilities by scaling up the amounts of data used and then making the model speak as many languages as possible. Since this is all just research code, I don't plan to optimize for inference on any device or make it production safe to use in products. This is just a playground for experiments.

showgan commented 9 months ago

Thanks for the clarification and the other details!!