NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.33k stars 2.55k forks source link

Possible bug in the tutorial for fine-tuning hifi #4181

Closed Nistrian closed 2 years ago

Nistrian commented 2 years ago

Hello! I'm trying to fine-tune the HiFi according to your fastpitch_finetuning tutorial. I generated manifests in the same format as in the example. Then I launched the fine-tuning script by simply changing the paths. Despite the small number of changes, I constantly get an error. I tried to study the code to understand what I'm doing wrong. In the process of studying, I found a possible cause of my troubles and would like to ask you on whose side the error occurred. I think it would be more convenient to explain the essence of my question with screenshots.

You use the file path as the model.train_ds parameter:

image

Next, I went through the error traceback and found that at some point the parameter that was specified in the previous step is used. If I didn’t mix up anything, then we also just use the path to the file:

image

In the next traceback step, we send this path to the setup_data_loader_from_config function, where the error occurs. Looking inside the function, I found that our argument is checked for the presence of the word "dataset". It is at this point that the error occurs, since the path to my manifest does not contain this word. Although, apparently, it was assumed that the input of the function would not be a string containing the path to the manifest, but the config itself, which would contain the specified argument. The same thing can be observed in the two lines of code below, where the presence of dataloader_params is checked:

image

In this regard, I really hope for your help. Thanks in advance!

redoctopus commented 2 years ago

Ah yep. There's been a change to the way the HiFi-GAN config works, and the tutorial is somewhat out of date. See Subhankar's fix in #4182 (linked above as well).

The long and short of it is that the train_ds and validation_ds arguments now point to other config files (e.g. https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/hifigan/model/train_ds/train_ds_finetune.yaml) that specify additional parameters based on whether you're fine-tuning or training from scratch. These will contain the dataset heading that you're looking for!

So instead of setting those to specify your manifests, please set train_dataset=./hifigan_train_ft.json and validation_datasets=./hifigan_val_ft.json to point to your manifest instead.

Hopefully this solves your problem.