NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.05k stars 2.51k forks source link

Error when training TalkNet spectrogram #4020

Closed davidmartinrius closed 2 years ago

davidmartinrius commented 2 years ago

Hello,

I am training a model in TalkNet. Before start training the spectrogram I created the durations.pt and f0s.pt

After that I started training the spectrogram with talknet_spect.py from examples folder.

When talknet_spect.py process N epochs it works well.

The training crashes when start a validation and it throws the next exception: RuntimeError: The size of tensor a (648) must match the size of tensor b (649) at non-singleton dimension 2

I tried to train from branches main, r1.8.0 and tag v1.7.2, but I got the same error.

It is not the first time that I train a model with talknet but I never had this kind of issue.

I share with you the complete stacktrace:

python3 talknet_spect.py sample_rate=22050 \ train_dataset=/home/pc/Downloads/dataset.json \ validation_datasets=/home/pc/Downloads/dataset.json \ durs_file=/home/pc/Downloads/talknet/durations.pt \ f0_file=/home/pc/Downloads/talknet/f0s.pt \ trainer.max_epochs=100 \ trainer.check_val_every_n_epoch=1 \ model.train_ds.dataloader_params.batch_size=10 \ model.train_ds.dataloader_params.num_workers=8 \ model.validation_ds.dataloader_params.num_workers=8 /home/pc/.local/lib/python3.8/site-packages/apex/pyprof/init.py:5: FutureWarning: pyprof will be removed by the end of June, 2022 warnings.warn("pyprof will be removed by the end of June, 2022", FutureWarning) ################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=' in your $HOME/.profile)

(or run as: KALDI_ROOT= python .py)

################################################################################

[NeMo W 2022-04-17 18:25:22 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk. [NeMo W 2022-04-17 18:25:23 nemo_logging:349] /home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing Trainer(accelerator='ddp') has been deprecated in v1.5 and will be removed in v1.7. Use Trainer(strategy='ddp') instead. rank_zero_deprecation(

[NeMo W 2022-04-17 18:25:23 nemo_logging:349] /home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:151: LightningDeprecationWarning: Setting Trainer(checkpoint_callback=False) is deprecated in v1.5 and will be removed in v1.7. Please consider using Trainer(enable_checkpointing=False). rank_zero_deprecation(

GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs [NeMo W 2022-04-17 18:25:23 nemo_logging:349] /home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:61: LightningDeprecationWarning: Setting Trainer(flush_logs_every_n_steps=1000) is deprecated in v1.5 and will be removed in v1.7. Please configure flushing in the logger instead. rank_zero_deprecation(

[NeMo I 2022-04-17 18:25:23 exp_manager:281] Experiments will be logged at /home/pc/Downloads/talknet/trainers/nemo_experiments/TalkNetSpect/2022-04-17_18-25-23 [NeMo I 2022-04-17 18:25:23 exp_manager:647] TensorboardLogger has been set up [NeMo W 2022-04-17 18:25:23 nemo_logging:349] /home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2313: LightningDeprecationWarning: Trainer.weights_save_path has been deprecated in v1.6 and will be removed in v1.8. rank_zero_deprecation("Trainer.weights_save_path has been deprecated in v1.6 and will be removed in v1.8.")

[NeMo W 2022-04-17 18:25:23 exp_manager:881] The checkpoint callback was told to monitor a validation value and trainer's max_steps was set to -1. Please ensure that max_steps will run for at least 1 epochs to ensure that checkpointing will not error out. [NeMo W 2022-04-17 18:25:23 deprecated:63] Class TalkNetSpectModel is deprecated. It is going to be removed in the 1.9 version. TalkNetSpectModel will be removed. Please, use MixerTTSModel instead. Created a temporary directory at /tmp/tmp663jyb35 Writing /tmp/tmp663jyb35/_remote_module_non_sriptable.py [NeMo W 2022-04-17 18:25:23 deprecated:63] Class AudioToCharWithDursF0Dataset is deprecated. It is going to be removed in the 1.8 version. Please, use nemo.tts.collections.torch.data.TTSDataset instead. [NeMo E 2022-04-17 18:25:23 vocabs:324] Torch distributed needs to be initialized before you initialized <nemo.collections.common.data.vocabs.Phonemes object at 0x7f7f41882b80>. This class is prone to data access race conditions. Now downloading corpora from global rank 0. If other ranks pass this before rank 0, errors might result. [NeMo I 2022-04-17 18:25:25 collections:186] Dataset loaded with 85959 files totalling 120.93 hours [NeMo I 2022-04-17 18:25:25 collections:187] 0 files were filtered totalling 0.00 hours [NeMo E 2022-04-17 18:33:42 vocabs:324] Torch distributed needs to be initialized before you initialized <nemo.collections.common.data.vocabs.Phonemes object at 0x7f7f18548910>. This class is prone to data access race conditions. Now downloading corpora from global rank 0. If other ranks pass this before rank 0, errors might result. [NeMo I 2022-04-17 18:33:44 collections:186] Dataset loaded with 85959 files totalling 120.93 hours [NeMo I 2022-04-17 18:33:44 collections:187] 0 files were filtered totalling 0.00 hours [NeMo E 2022-04-17 18:41:43 vocabs:324] Torch distributed needs to be initialized before you initialized <nemo.collections.common.data.vocabs.Phonemes object at 0x7f7efc733d30>. This class is prone to data access race conditions. Now downloading corpora from global rank 0. If other ranks pass this before rank 0, errors might result. [NeMo I 2022-04-17 18:41:43 features:259] PADDING: 1 [NeMo I 2022-04-17 18:41:43 features:276] STFT using torch Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1 Added key: store_based_barrier_key:1 to store for rank: 0 Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.

distributed_backend=nccl All distributed processes registered. Starting with 1 processes

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [NeMo W 2022-04-17 18:41:46 modelPT:496] The lightning trainer received accelerator: <pytorch_lightning.accelerators.gpu.GPUAccelerator object at 0x7f7f4796fa00>. We recommend to use 'ddp' instead. [NeMo I 2022-04-17 18:41:46 modelPT:587] Optimizer config = Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) eps: 1e-08 lr: 0.001 maximize: False weight_decay: 1e-06 ) [NeMo I 2022-04-17 18:41:46 lr_scheduler:833] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7f7f073e6f70>" will be used during training (effective maximum steps = 859600) - Parameters : (min_lr: 1.0e-05 warmup_ratio: 0.02 max_steps: 859600 )

| Name | Type | Params

0 | preprocessor | AudioToMelSpectrogramPreprocessor | 0
1 | embed | GaussianEmbedding | 7.6 K 2 | norm_f0 | MaskedInstanceNorm1d | 0
3 | res_f0 | StyleResidual | 512
4 | encoder | ConvASREncoder | 8.7 M 5 | proj | Conv1d | 82.0 K

8.7 M Trainable params 0 Non-trainable params 8.7 M Total params 34.986 Total estimated model params size (MB) Epoch 0: 0%| | 0/9940 [00:00<?, ?it/s][W reducer.cpp:1289] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) Epoch 0: 11%|██████████▎ | 1093/9940 [17:53<2:24:50, 1.02it/s, loss=3.37, v_num=5-2Epoch 0: 11%|██████████▏ | 1094/9940 [17:53<2:24:42, 1.02it/s, loss=3.37, v_num=5-23]Epoch 0: 11%|██████████▏ | 1095/9940 [17:53<2:24:34, 1.02it/s, loss=3.37, v_num=5-Epoch 0: 11%|██████████ | 1096/9940 [17:54<2:24:27, 1.02it/s, loss=3.37, v_num=5-23]Epoch 0: 11%|██████████ | 1096/9940 [17:54<2:24:27, 1.02it/s, loss=3.39, v_num=Epoch 0: 11%|█████████▉ | 1097/9940 [17:59<2:24:59, 1.02it/s, loss=3.39, v_num=5-23]Epoch 0: 15%|██████████Epoch 0: 33%|██████████████████████████████▌ | 3231/9940 [51:30<1:46:56, 1.05Epoch 0: 33%|█████████████████████████████▎ | 3232/9940 [51:34<1:47:02, 1.04it/s, loss=2.66, v_num=5-23]Epoch 0: 45%|██████████████████Epoch 0: 49%|█████████████████████████████████████████████ | 4870/9940 [1:17:48<1:21:00, 1.04it/s, loss=2.54, v_num=5-23]Error executing job with overrides: ['sample_rate=22050', 'train_dataset=/home/pc/Downloads/dataset.json', 'validation_datasets=/home/pc/Downloads/dataset.json', 'durs_file=/home/pc/Downloads/talknet/durations.pt', 'f0_file=/home/pc/Downloads/talknet/f0s.pt', 'trainer.max_epochs=100', 'trainer.check_val_every_n_epoch=1', 'model.train_ds.dataloader_params.batch_size=10', 'model.train_ds.dataloader_params.num_workers=8', 'model.validation_ds.dataloader_params.num_workers=8'] Traceback (most recent call last): File "talknet_spect.py", line 29, in main trainer.fit(model) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit self._call_and_handle_interrupt( File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, *kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1237, in _run results = self._run_stage() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1324, in _run_stage return self._run_train() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1354, in _run_train self.fit_loop.run() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance batch_output = self.batch_loop.run(batch, batch_idx) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, *kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance result = self._run_optimization( File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization self._optimizer_step(optimizer, opt_idx, batch_idx, closure) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step self.trainer._call_lightning_module_hook( File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1596, in _call_lightning_module_hook output = fn(*args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step optimizer.step(closure=optimizer_closure) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 278, in optimizer_step optimizer_output = super().optimizer_step(optimizer, opt_idx, closure, model, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step return optimizer.step(closure=closure, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, *kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/optim/adam.py", line 100, in step loss = closure() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure closure_result = closure() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in call self._result = self.closure(*args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure step_output = self._step_fn() File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", step_kwargs.values()) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1766, in _call_strategy_hook output = fn(args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 344, in training_step return self.model(*args, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 963, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/pc/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/pc/.local/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 82, in forward output = self.module.training_step(*inputs, kwargs) File "/home/pc/Downloads/NeMo/nemo/utils/model_utils.py", line 364, in wrap_training_step output_dict = wrapped(*args, *kwargs) File "/home/pc/Downloads/NeMo/nemo/collections/tts/models/talknet.py", line 301, in training_step pred_mel = self(text=text, text_len=text_len, durs=durs, f0=f0) File "/home/pc/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/home/pc/Downloads/NeMo/nemo/core/classes/common.py", line 835, in call outputs = wrapped(*args, *kwargs) File "/home/pc/Downloads/NeMo/nemo/collections/tts/models/talknet.py", line 285, in forward x = self.res_f0(x, f0) File "/home/pc/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/home/pc/Downloads/NeMo/nemo/collections/tts/modules/talknet.py", line 136, in forward return x + self.rs(s) RuntimeError: The size of tensor a (648) must match the size of tensor b (649) at non-singleton dimension 2

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Do you know why is it raising that error and how to solve it?

Thank you!

OS version: Ubuntu 21.10 Python3.8 GPU: RTX 3080 pytorch-lightning 1.6.0 torch 1.11.0+cu113 torch-stft 0.1.4 torchaudio 0.11.0+cu113 torchmetrics 0.7.3 torchvision 0.12.0+cu113

David Martin Rius

redoctopus commented 2 years ago

It looks like there might be some slight mismatch in the supplementary data you have calculated and what the model is expecting based on the input (perhaps padding with space or difference in normalization).

Have you tried clearing the supplementary data folder (or passing in an empty one) and letting the dataset class automatically generate the durations and pitches? Those should have the expected dimensions and avoid the error.

davidmartinrius commented 2 years ago

It looks like there might be some slight mismatch in the supplementary data you have calculated and what the model is expecting based on the input (perhaps padding with space or difference in normalization).

Have you tried clearing the supplementary data folder (or passing in an empty one) and letting the dataset class automatically generate the durations and pitches? Those should have the expected dimensions and avoid the error.

First of all, thanks for responding.

My dataset contains special characters like "•" because I am training catalan language. This language uses this symbol with frequence.

I think it migh cause this error, but that kind of symbol is necessary in this language.

In any case it is a supposition and I should try a dataset without sentences that contain special characters in this language.

But, before taking a stab in the dark, could you confirm that is this a possible cause?

If afirmative, how to train a talknet model with special characters in the dataset?

Thank you!

redoctopus commented 2 years ago

That's entirely possible, if there's a subtle mismatch in how the text is processed when you are calculating e.g. duration information per token.

Can you try passing in an empty supplementary data folder and letting the TTSDataset generate the supplementary data automatically, and see if the error persists? This way it should definitely match up with the normalized and tokenized text, and the preprocessed audio shapes. It will put the durations and pitches it calculates into that folder and if it works you'll be able to check if the dimensions match what you would expect.

(You may also want to check what the TTSDataset's text normalization is doing to your dataset to make sure it's not getting rid of important characters/tokens, since by default it uses English normalization rules and strips some nonstandard characters.)

davidmartinrius commented 2 years ago

Can you try passing in an empty supplementary data folder and letting the TTSDataset generate the supplementary data automatically, and see if the error persists?

What specifically do you mean when you say to pass an empty supplementary data folder? I don't know in what step should I do it and how to do it.

Could you share an example, please? I'm sorry, I have little experience training with torch.

redoctopus commented 2 years ago

Ahh shoot, I forgot that TalkNet will be deprecated soon and hasn't been switched over to the new TTSDataset and still uses the AudioToCharWithDursF0Dataset, which still requires precomputed duration/pitch files.

In this case, how are you generating these files? Can you check that the dimensions of the results match the dimensions of the normalized and tokenized output of the AudioToCharWithDursF0Dataset?

davidmartinrius commented 2 years ago

I didn't know that Talknet was about to be deprecated. So I did not invest my time to investigate it.

I supose that fastpitch is the model that is going to be maintained? Is there a roadmap where I can see the nexts steps of NeMo?

Actually, I started training TTS models with Tacotron2. Then I had an issue, I reported it in this project and another guy from github told me that Tacotron2 was not going to be maintained here. So I started training TTS models with Talknet.. and now also Talknet is deprecated.

So, please, can you tell me what is the correct way to train a TTS model? Thank you!

redoctopus commented 2 years ago

Various models have been marked for deprecation for several months, starting from this PR: #3576. We decided to remove TalkNet as well last month, in #3955.

Our recommended way to train a TTS model is using FastPitch and HiFi-GAN, which will be supported for the foreseeable future.

davidmartinrius commented 2 years ago

Ok thanks!