Open camjac251 opened 4 years ago
just a simple missing dependency, so I ignored it and moved on to see the rest of the code.
The code stops at the missing dependency and so everything else in the first block has not been imported.
import librosa
import torch
from hparams import create_hparams
from model import Tacotron2, load_model
from waveglow.denoiser import Denoiser
from layers import TacotronSTFT
from data_utils import TextMelLoader, TextMelCollate
from text import cmudict, text_to_sequence
from mellotron_utils import get_data_from_musicxml
was never run.
I'd suggest installing any dependencies you can. Most are required to run the notebook.
Ah, I see, that makes sense. I'll go ahead and install the missing dependency then. Hopefully it'll like the latest version. Thank you :)
I opted to use version 0.25.3 of pandas since it was around the same time this project was uploaded.
I had to also add in the existing version of numpy or else it would update it to latest and cause issues I believe conda install pandas=0.25.3 numpy=1.16.4
.
No errors anymore except for IndentationError: unexpected indent
, harmless.
I guess I just rename checkpoint_#####
to mellotron_ljs.pt
, or is there a conversion process of checkpoints to the pt
extension for inference?
Going to attempt to train waveglow next before running the full inference code
It's easier to rename
checkpoint_path = "models/mellotron_libritts.pt"
to
checkpoint_path = "outdir/checkpoint_XXXXXXX"
inside the notebook. I don't believe any conversion is required to test the checkpoint.
I must be missing something about the training procedure. I followed the waveglow readme on training instructions because I thought you use a seperate mellotron and waveglow model to synthesize results. But when I try to train I get
(condaenv): python train.py -c config.json
Traceback (most recent call last):
File "train.py", line 39, in <module>
from mel2samp import Mel2Samp
File "C:\Users\camja\Desktop\mellotron\waveglow\mel2samp.py", line 38, in <module>
from tacotron2.layers import TacotronSTFT
ModuleNotFoundError: No module named 'tacotron2.layers'
If I try to run it with the waveglow model available on the readme, I get this error
C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
which might be breaking it. Because this is my result and the audio it created https://voca.ro/kFaOGGxbLAj
@camjac251 The Predicted Mel is from Mellotron therefore Mellotron is the one acting up here.
@camjac251 Did you start your Mellotron model from scratch? The Source rhythm should be a diagonal line where each text input would match to a part of the output time however this requires it to be trained to a decent degree without using pretrained weights as the starting point.
Yeah I did. I started from nothing with the LJS speech dataset. It initially trained and had to be restarted every now and then so I would train with this python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path outdir/checkpoint_#####
every time. I did see another issue about quality loss with resuming #30 so I changed use_saved_learning_rate=True
, but didn't touch ignore_layers=['speaker_embedding.weight']
. I trained it up to 16,000 iterations when it started to look good on the predicted.
@camjac251 Refer to the notebook. https://github.com/NVIDIA/mellotron/blob/master/inference.ipynb You can see that there is a Green/Yellow line in the last graph. That's you alignment aka how well the model has learned to link the text/f0 to the audio. Your tensorboard output shows the the model is still learning alignment (top image in your comment).
Those breaks are when my machine turned off during training, it's happened at least 20 times I think during training. I can only get 6 hours of constant training in a row per day before short bursts of restarting it.
I let it run longer and tried today to get a result
@rafaelvalle Is this a ok to ignore with waveglow?
C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
I feel like it might be why my audio is sounding like this https://voca.ro/bT4CP3It1BV
Edit: If waveglow requires pytorch 1.0, then how come mellotron doesn't impose this same requirement in the readme if tacotron2 does.
The issue comes from the mel-spectrogram you are producing. Your model hasn't learned to attend yet.
It's been a while since I last tried training, but that was after 2 weeks or so of constant training (at least uninterrupted, there were other sessions before)
I thought maybe setting the suggested settings from #30 could help but it might've hurt instead. I changed in my hqparams use_saved_learning_rate=True
and ignore_layers=[]
Do commas and breaths mess up the alignment of data with training? I have quite a few samples that repeat half of the word before saying the full word, but I tried to add in those to the transcribed part and commas when a thought is changed mid sentence.
Can you share a screenshot of your tensorboard logs with training and validation curves, the attention maps and predicted mel-spectrograms ?
I'm not sure where to find the attention map. I went back and forth with the hqparam settings and think I even started over with the LJSpeech model as the starter and this might be the result of that, I can't remember. I started with warm start and that would run for a few weeks, I believe.
The validation loss is going up, showing evidence that your model is overfitting.
Does it need just more time and training data?
Take a look at issues related to overfitting in the tacotron2 repo. https://github.com/NVIDIA/tacotron2
Ok thank you. I'll look for answers there. I've been able to generate audio that sounded like the voice I was training with but some words in the sentence sounded a bit slurred or were missing in the generated audio. I was worried that it might've been my training set and that more time training wouldn't have helped.
Augment your data if you can.
I wanted to try and synthesize a short sample using a model I've been training before training but I think I'm running into some more issues :/
I ran
conda install -c conda-forge notebook
but then decided onconda install -c conda-forge jupyterlab
, since it has both the new lab and notebook. When opening "inference.ipynb", I started to run the cells one by oneFirst block gave this error
just a simple missing dependency, so I ignored it and moved on to see the rest of the code.
And then I stopped it at Load models. Are these not supposed to show? It feels like I"m using the wrong project or something