NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.05k stars 2.51k forks source link

How to train NeMo TTS Tacotron2 model from pretrained model? #3249

Closed Sueoka-ppc closed 2 years ago

Sueoka-ppc commented 2 years ago

Describe the bug

Do Nemo TTS models train from pre-trained model?

Command is blow

python tacotron2.py train_dataset=filelists/transcript_train.json validation_datasets=filelists/transcript_val.json trainer.accelerator=null trainer.check_val_every_n_epoch=1 --pretrained_model=tts_en_tacotron2.nemo [NeMo W 2021-11-26 11:34:27 optimizers:47] Apex was not found. Using the lamb optimizer will error out. [NeMo W 2021-11-26 11:34:32 nmse_clustering:54] Using eigen decomposition from scipy, upgrade torch to 1.9 or higher for faster clustering [NeMo W 2021-11-26 11:34:34 experimental:28] Module <class 'nemo.collections.asr.data.audio_to_text_dali._AudioTextDALIDataset'> is experimental, not ready for production and is not fully supported. Use at your own risk. usage: tacotron2.py [--help] [--hydra-help] [--version] [--cfg {job,hydra,all}] [--resolve] [--package PACKAGE] [--run] [--multirun] [--shell-completion] [--config-path CONFIG_PATH] [--config-name CONFIG_NAME] [--config-dir CONFIG_DIR] [--info [{all,config,defaults,defaults-tree,plugins,searchpath}]] [overrides [overrides ...]] tacotron2.py: error: unrecognized arguments: --pretrained_model=tts_en_tacotron2.nemo

Oktai15 commented 2 years ago

@Sueoka-ppc, take a look at this notebook about fine-tuning TTS models (in this case, it is FastPitch, but you can use it for Tacotron 2 as well): https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_Finetuning.ipynb.

Sueoka-ppc commented 2 years ago

Hi thanks to replay my question

But sevral data cannot get from data blow

additional files

wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/tts_dataset_files/cmudict-0.7b_nv22.01 \ wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/tts_dataset_files/heteronyms-030921 \ wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/nemo_text_processing/text_normalization/en/data/whitelist_lj_speech.tsv \

and please tell me how finetune from pre-trained tacotron2 models??

redoctopus commented 2 years ago

For wget to work you need to specify which $BRANCH you want (probably main). The first code cell in the notebook should set that, in case you hadn't run it yet.

Fine-tuning Tacotron should be similar to fine-tuning the other TTS models in that tutorial, but you'll want to use https://github.com/NVIDIA/NeMo/blob/main/examples/tts/tacotron2_finetune.py specifically.

Sueoka-ppc commented 2 years ago

Thanks It was resolved, but another error happened, it is due to power arch error, fast-pitch cannot use on ppc64le machine.