[Question] Does TTS modules (for example Tacotron2) support fine-tuning like ASR ?

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

Apache License 2.0

11.84k stars 2.46k forks source link

[Question] Does TTS modules (for example Tacotron2) support fine-tuning like ASR ? #1412

Closed khalilRhouma closed 2 years ago

khalilRhouma commented 3 years ago

Is fine-tuning with TTS not supported yet?

Am working on transfer learning with TTS models From English to the Arabic language. I saw an example of fine-tuning using ASR Quartznet model mentioned here. I am using this pre-trained model:

tacotron2= nemo_tts.models.Tacotron2Model.from_pretrained("Tacotron2-22050Hz")

I found that Tacotron model doesn't support model.change_vocabulary() and it raises receptions when I am using model.setup_training_data().

Nemo version: 1.0.0b1 Method of NeMo install: pip install nemo_toolkit[all]==1.0.0b1

donand commented 3 years ago

Hello, any update on this issue?

I have the same problem, I would like to perform a fine-tuning of Tacotron 2 model with a different set of labels (characters/phonemes in input to the model).

I'm also interested in simple fine-tuning without changing the set of labels, is this feature already available in Nemo? I didn't find the option to start the training from a checkpoint.

Thanks!

titu1994 commented 2 years ago

Tacotron is not easily finetunable, we suggest using other supported TTS models which support finetuning.