Closed godspirit00 closed 2 years ago
after it spending over 1 hour trying to Loading dataset
@godspirit00, you can pre-normalize your text and add in your manifests as "normalized_text" field for every json line. In this case, loading will be much faster (see as we did it for LJSpeech: https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/tts/ljspeech/get_data.py#L101)
@Oktai15 Thank you so much! I will give it a try right away.
Describe the bug
When I was trying to train Fastpitch from scratch using
fastpitch_align_v1.05.yaml
config, after it spending over 1 hour trying toLoading dataset
, it threw the error:Steps/Code to reproduce bug
python examples/tts/fastpitch.py --config-name=fastpitch_align_v1.05.yaml train_dataset=/root/autodl-tmp/nancy_train.json validation_datasets=/root/autodl-tmp/nancy_val.json sup_data_path=/root/autodl-tmp/nemo-training/nancy_22k/sup_data exp_manager.exp_dir=/root/autodl-tmp/nemo-training/nancy_22k exp_manager.resume_if_exists=True exp_manager.resume_ignore_no_checkpoint=True model.train_ds.dataloader_params.batch_size=24 pitch_mean=199.37802124023438 pitch_std=55.59949493408203
Expected behavior
A clear and concise description of what you expected to happen.
Environment overview (please complete the following information)
Environment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
Additional context
Add any other context about the problem here. Example: RTX 3090