Closed potatocharlie1 closed 9 months ago
The error happenedn when doing sanity check with validation datasets. By default, it should use one batch for this check. From the error trace ValueError: Caught ValueError in DataLoader worker process 0.
, there may be several issues as described below,
A side notes: the above diagnosis is also applied for training dataset, please double-check your training datasets as well in case the same error happens during the training.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Describe the bug I am trying to finetune hifigan with spectrograms created with a previously finetuned FastPitch, following the tutorials: FastPitch_Finetuning.ipynb and FastPitch_GermanTTS_Training.ipynb. Finetuning Fastpitch went well, however when I try to finetune hifigan on the same data is always raises this error:
Sanity Checking: 0it [00:00, ?it/s]Error executing job with overrides: ['model.max_steps=10', 'model.optim.lr=0.00001', '~model.optim.sched', 'train_dataset=./sad_data_manifest_train_local_mel.json', 'validation_datasets=./sad_data_manifest_test_local_mel.json', 'exp_manager.exp_dir=hifigan_ft', '+trainer.val_check_interval=5', '+init_from_pretrained_model=tts_en_hifigan', 'trainer.check_val_every_n_epoch=null', 'model/train_ds=train_ds_finetune', 'model/validation_ds=val_ds_finetune'] Traceback (most recent call last): File "/mnt/c/Users/charl/Documents/synvoice/hifigan_finetune.py", line 28, in main trainer.fit(model) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit call._call_and_handle_interrupt( File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, kwargs) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, *kwargs) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run results = self._run_stage() File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in _run_stage self._run_sanity_check() File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1050, in _run_sanity_check val_loop.run() File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator return loop_run(self, args, kwargs) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 108, in run batch, batch_idx, dataloader_idx = next(data_fetcher) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 137, in next self._fetch_next_batch(self.dataloader_iter) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 151, in _fetch_next_batch batch = next(iterator) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 285, in next out = next(self._iterator) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 123, in next out = next(self.iterators[0]) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/site-packages/nemo/collections/tts/data/dataset.py", line 1144, in getitem
start = random.randint(0, mel.shape[1] - frames - 1)
File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/random.py", line 370, in randint
return self.randrange(a, b+1)
File "/home/charlie/anaconda3/envs/nemo/lib/python3.10/random.py", line 353, in randrange
raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, -12, -12)
Steps/Code to reproduce bug generates mels using the generate_mels.py:
and then started finetuning:
Expected behavior
Hifigan finetuning starts
Environment overview (please complete the following information)
Environment details
OS: Linux 5.15.90.1-microsoft-standard-WSL2 PyTorch version: 2.2.0.dev20230913 Python version: 3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]
Additional context
I am using parts of the iemocap dataset. The files differ in length, but generating the mels already pads/cuts them to roughly 6 seconds. The mels from generate_mels.py look good. I have looked into the other similar issues and therefore changed to the generate_mels.py code, but it did not fix the issue.