as-ideas / ForwardTacotron

⏩ Generating speech in a single forward pass without any attention!
https://as-ideas.github.io/ForwardTacotron/
MIT License
579 stars 113 forks source link

Training multiple models #38

Open Alexius08 opened 4 years ago

Alexius08 commented 4 years ago

I've trained a model based on the LJSpeech dataset and found the results quite satisfactory after 25000 steps in ForwardTacotron. Now, I'm currently preparing several other datasets where new models would be based.

  1. How do I switch training between different models? I know that I could specify the path of my target dataset when running the preprocess script, but could I do the same with the training scripts?
  2. When generating sentences, how could I use a specific model when generating?
  3. Do the results of previous trainings in previous models affect training new models?
  4. If I add new audio samples to one of my datasets and preprocess it again, would the training for the model start from the beginning again or could it pick up from where it left before new samples were added?
Alexius08 commented 4 years ago

Editing wav_path in hparams.py turned out to be the key. I ran into another problem, however. The training scripts refuse to run at all using my new datasets. For the new ones, their respective versions of train_dataset.pkl only contains the following: €]”. No idea what is causing this.

Alexius08 commented 4 years ago

Got past that error message by setting all wav files to 16 bits and 22050 Hz, but I ran into another error:

Traceback (most recent call last):
  File "train_tacotron.py", line 192, in <module>
    trainer.train(model, optimizer)
  File "C:\Users\Alexius08\Documents\GitHub\ForwardTacotron\trainer\taco_trainer.py", line 37, in train
    self.train_session(model, optimizer, session)
  File "C:\Users\Alexius08\Documents\GitHub\ForwardTacotron\trainer\taco_trainer.py", line 57, in train_session
    for i, (x, m, ids, x_lens, mel_lens) in enumerate(session.train_set, 1):
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
    data = self._next_data()
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 474, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 427, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 227, in iter
    for idx in self.sampler:
  File "C:\Users\Alexius08\Documents\GitHub\ForwardTacotron\utils\dataset.py", line 268, in iter
    binned_idx = np.stack(bins).reshape(-1)
  File "<array_function internals>", line 5, in stack
  File "C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 422, in stack
    raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack

The generated train_dataset.pkl, val_dataset.pkl and text_dict.pkl files don't have line breaks at all.

cschaefer26 commented 4 years ago

Hi, did you also change the data path in hparams? Because otherwise it would probably mix two datasets. The error message indicates that there is no training file to be loaded. I would double check whether the wav file names match the ids in the metafile.csv (if you run the preprocess.py it should say something about how many files are used). The train_dataset.pkl is binary pickled, if you want to have a look at it you would need to load it with the unpickle_binary() function in utils - probably makes sense for debuggin. You could also have a look in data/mel and see if any files are there.

As for the other questions:

When generating sentences, how could I use a specific model when generating?

You can switch by using the --hp_file and --tts_weights flag for the corresponding models. If your models differ in hyperparams you would need to save the different hparams.py files somewhere. If the hparams are the same just setting the --tts_weights to the ***_weights.pyt model should be enough.

Do the results of previous trainings in previous models affect training new models?

No.

If I add new audio samples to one of my datasets and preprocess it again, would the training for the model start from the beginning again or could it pick up from where it left before new samples were added?

If you don't change the tts_model_id in hparams.py it is going to resume training the previous model, otherwise it creates a new directory with the new tts_model_id under checkpoints.

Alexius08 commented 4 years ago

Just checked the binary pickled files. The training data for my first custom dataset is an empty array, while the training data for my second custom dataset, as well as the value dataset and text dictionary in both datasets, look normal (an array of tuples containing a filename and a three digit number for the value data and the normally-generated training data, and a massive object that paired each filename with the IPA equivalent of the text transcripts in the text dictionary). Meanwhile, the mel, quant, and raw_pitch folders each has one .npy file for every wav file in the dataset, while the phon_pitch folder for both datasets are empty.

cschaefer26 commented 4 years ago

In this case it seems to me that there is a mismatch of text ids and wav file names, because it is only taking into account files that are matching. Did you check this? I.e. you could debug in the preprocess.py file and check how many files are filtered at lint 86. The stemmed wav file names should match the id in the metafile (e.g. 00001|some text. corresponds to 00001.wav)

Alexius08 commented 4 years ago

When running preprocess.py, there's no mismatch. The number of files found equals the number of indexed files. However, when I added more clips to the two datasets, running tacotron.py went smoothly for one of the datasets (40 minutes split across 370 clips), while I still got an error message with the other (29 minutes split across 250 clips). Perhaps dataset size has something to do with these errors.

Alexius08 commented 4 years ago

Also, I had to change lines 39 and 40 on my copy of train_tacotron.py to point it to my dataset's pickle files. Left unchanged, it kept trying to access the LJSpeech alignment files.

cschaefer26 commented 3 years ago

Good point, I will changes the scripts to take into account the hparams setting. I honestly mostly leave the data naming the same and make copies of the dataset if i train a new model. Could you solve the issue with the smaller dataset? I'm not sure what you mean by adding clips to the dataset, you would have to preprocess the whole dataset again if you add clips... (otherwise its not generating the correct train_dataset.pkl file).