fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.13k stars 697 forks source link

Some GTA files missing after running train_tacotron.py --force_gta #163

Open aguazul opened 4 years ago

aguazul commented 4 years ago

Thanks for this project! :D

I've been training tacotron for a few days now and its up to 192K steps.

I ran train_tacotron.py --force_gta and it completed.

However when I run train_wavernn.py --gta, it keeps saying that it can't find some of the files. Each time I run it it complains about a different missing file. I have confirmed that the filepath is correct and also confirmed that the files are actually missing. Is this caused by train_tacotron.py --force_gta not creating all the correct files? How do I get the missing files to be produced?

`(pyGPUenv) C:\Users\Brandon\Documents\WaveRNN-master\WaveRNN-master>python train_wavernn.py --gta Using device: cuda

Initialising Model...

Trainable Parameters: 4.234M Restoring from latest checkpoint... Loading latest weights: C:\Users\Documents\WaveRNN-master\WaveRNN-master\checkpoints\ljspeech_mol.wavernn\latest_weights.pyt Loading latest optimizer state: C:\Users\Documents\WaveRNN-master\WaveRNN-master\checkpoints\ljspeech_mol.wavernn\latest_optim.pyt +-------------+------------+--------+--------------+-----------+ | Remaining | Batch Size | LR | Sequence Len | GTA Train | +-------------+------------+--------+--------------+-----------+ | 1000k Steps | 32 | 0.0001 | 1375 | True | +-------------+------------+--------+--------------+-----------+

Traceback (most recent call last): File "train_wavernn.py", line 159, in main() File "train_wavernn.py", line 85, in main voc_train_loop(paths, voc_model, loss_func, optimizer, train_set, test_set, lr, total_steps) File "train_wavernn.py", line 105, in voc_train_loop for i, (x, y, m) in enumerate(train_set, 1): File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data\dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\Users\Documents\WaveRNN-master\WaveRNN-master\utils\dataset.py", line 27, in getitem m = np.load(self.mel_path/f'{item_id}.npy') File "C:\Users\Anaconda3\envs\pyGPUenv\lib\site-packages\numpy\lib\npyio.py", line 415, in load fid = open(os_fspath(file), "rb") FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\\Documents\WaveRNN-master\WaveRNN-master\JBDataset\gta\HHW Lesson 7 Living a Values Driven Life_200.npy'

(pyGPUenv) C:\Users\Documents\WaveRNN-master\WaveRNN-master>`

I'm training on Windows 10 with pytorch 1.3 and cuda 10.

Thank you :)

aguazul commented 4 years ago

I figured it out. The get_tts_datasets function ignores any samples greater than tts_max_mel_len, which is set in the hparams.py file to be 1250. I increased this to 3348, which is one greater than the longest sample in my dataset. Now when I run --force_gta it includes all files, even the longer ones.

What is the benefit of excluding the longer samples? What effect does this have on training time if any? On the results, if any?

Thanks!

oytunturk commented 4 years ago

Basic attention mechanisms are not very robust when training with long input/output sequences. This becomes especially problematic if one has long training phrases which may contain long pauses making the mapping between input sequence and output sequence harder for the network to figure out.

Latest Google Tacotron paper https://arxiv.org/abs/1910.10288 seems to offer solutions based on more sophisticated attention mechanisms.

On Sun, Dec 8, 2019 at 3:58 PM Brandon Bosse notifications@github.com wrote:

I figured it out. The get_tts_datasets function ignores any samples greater than tts_max_mel_len, which is set in the hparams.py file to be

  1. I increased this to 3348, which is one greater than the longest sample in my dataset. Now when I run --force_gta it includes all files, even the longer ones.

What is the benefit of excluding the longer samples? What effect does this have on training time if any? On the results, if any?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/163?email_source=notifications&email_token=ABMAQJYEYTFFMTO6RNR6VZTQXWCZBA5CNFSM4JXQVFS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGHOHZA#issuecomment-563012580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAQJ4FM4ZNMDA4CCVMFR3QXWCZBANCNFSM4JXQVFSQ .

gabriel-souza-omni commented 4 years ago

If you'd rather not change that hyperparameter and just ignore the longer samples, this could be useful:

https://www.gitmemory.com/issue/fatchord/WaveRNN/72/492741940

cschaefer26 commented 4 years ago

As @aguazul stated the problem is the lack of filtering of long samples files for the vocoder. I fixed it for my dataset by changing line 40 in get_vocoder_datasets:

dataset_ids = [x[0] for x in dataset]

to

dataset_ids = [x[0] for x in dataset if x[1] <= hp.tts_max_mel_len]

mindmapper15 commented 4 years ago

Besides that attention is not very robust for long-term sentences, the maximum number of Decoder RNN's time step is (max_mel_len // reduction_factor). Increasing the number of time steps in RNN leads to increase VRAM usage.

That is, if the input sentence is too long, your GPU memory may explodes because there is too many time steps in Decoder RNN. In this case, either you have to reduce the batch size or set the tts_max_mel_len to lower value.