Closed adfost closed 4 years ago
Have you checked that the config of the MelGAN model you're trying to train matches that of the one you're fine-tuning from?
Well, I don't know where the 41 is coming from. I think the configuration is right, but I will check again.
@adfost can u share a command line :))). The best way you can do to help us is that provide us the code to reproduce the bug, you can fork or make colab -_-.
CUDA_VISIBLE_DEVICES=0 python examples/melgan/train_melgan.py --train-dir ./dump2/train/ --dev-dir ./dump2/valid/ --outdir ./examples/melgan/exp/train.melgan.v1/ --config ./examples/melgan/conf/melgan.v1.yaml --use-norm 1 --generator_mixed_precision 0 --resume ""
where train_melgan is modified to load a pretrained generator and discriminator.
I used a subset of this dataset: https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/, the male en-US voice.
And went through the steps to preprocess (it is intentionally in the same format) as recommended.
@adfost can you check if everything is fine if you training from scratch without loading pretrained?. And also, please check the error come from generator or discriminator :)).
The error comes from the discriminator
If I train from scratch, how many epochs will I need to see any improvement?
@adfost i will try to figure out the bug. But if you want fine-tune, you just need load generator, the discriminator should be train from scratch, so you won't face this error anymore :)). I recommend you training mb-melgan first. You should training atleast 1M steps for melgan model to get good performance. Around 100k, a generate audio can understandable. ^^.
Thank you I guess I will let it train longer. I only let it fine tune for about 1000 epochs and got practically static.
@adfost 1000 epochs ?, how many steps did you trained ?
I meant steps, misspoke
@adfost you should training melgan around 1M steps :))).
Well I did >35000 items of FastSpeech training and >100000 iters of mel training no sound of any voice.
@adfost so you did something wrong in preprocessing step. In this case, it is very hard to help since i do not know what are you dọig before training. Btw, where is ur duration file come from ?
I used the tacotron2 duration script to get durations. Also, I used tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml tensorflow-tts-compute-statistics --rootdir ./dump/train/ --outdir ./dump --config preprocess/ljspeech_preprocess.yaml tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --stats ./dump/stats.npy --config preprocess/ljspeech_preprocess.yaml
to preprocess, as suggested. Maybe you could check the format of https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/, as I have the suspicion that it is a formatting issue.
@adfost did you re train tacotron ?
No
@adfost that is a problem, pretrained ljspeech tacotron-2 can't use for other dataset to extract duration.
Is there another way to extract duration?
Also just wondering about dataloader. Training tacotron2 now
Is there another way to extract duration?
you can using MFA to extract duration
What is that?
@adfost see this PR https://github.com/TensorSpeech/TensorflowTTS/pull/147 or https://github.com/TensorSpeech/TensorflowTTS/pull/122 not yet merge
I'm a bit confused about that, I will try using the model I am training first. Also, I'm a bit confused by the Generator function in First, you need define data loader based on AbstractDataset class (see abstract_dataset.py). On this example, a dataloader read dataset from path. I use suffix to classify what file is a charactor, duration and mel-spectrogram (see fastspeech_dataset.py). If you already have preprocessed version of your target dataset, you don't need to use this example dataloader, you just need refer my dataloader and modify generator function to adapt with your case. Normally, a generator function should return [charactor_ids, duration, mel]. Pls see tacotron2-example to know how to extract durations Extract Duration
@adfost there are many way to extract durarion now. We can use tacotron or MFA tool to extract duration. The MFA not yet merge to master. I am trying to make prprocessing stage abtracter.
I think I can go for 10000 iterations of tacotron for a rudimentary model, then train the melgan and FastSpeech again to see if I can get something that isn't just unintelligible static
@adfost it's good if you can make a fork of this repo and commit ur code, ur config, ... so I can easy to help.
@adfost Try MFA first, it's not that complicated: https://github.com/ZDisket/TensorflowTTS/tree/master/examples/fastspeech2/mfa
ok trying that
Calling tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml with the extra argument gives FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/poisoned_pen_04_f000129-wave.npy'. I thought that file wasn't supposed to exist yet.
@adfost i will try to figure out the bug. But if you want fine-tune, you just need load generator, the discriminator should be train from scratch, so you won't face this error anymore :)). I recommend you training mb-melgan first. You should training atleast 1M steps for melgan model to get good performance. Around 100k, a generate audio can understandable. ^^.
@dathudeptrai Hi, when I try to fine-tune mb-melgan I also I also encountered ValueError: Shapes (4, 4, 64) and (41, 4, 64) are incompatible , how to load generator only and train discriminator from scratch ? I only know --resume modeler/checkpoints/ckpt-100000
Fixed :D.
After importing generator and discriminator models, I get the following error: Traceback (most recent call last): File "examples/melgan/train_melgan.py", line 505, in
main()
File "examples/melgan/train_melgan.py", line 477, in main
discriminator.load_weights("./examples/melgan/checkpoints/discriminator-1500000.h5")
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 2211, in load_weights
hdf5_format.load_weights_from_hdf5_group(f, self.layers)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 708, in load_weights_from_hdf5_group
K.batch_set_value(weight_value_tuples)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
return target(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3576, in batch_set_value
x.assign(np.asarray(value, dtype=dtype(x)))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 858, in assign
self._shape.assert_is_compatible_with(value_tensor.shape)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 1134, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 4, 64) and (41, 4, 64) are incompatible