TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

Error when fine tuning MelGAN #148

Closed adfost closed 4 years ago

adfost commented 4 years ago

After importing generator and discriminator models, I get the following error: Traceback (most recent call last): File "examples/melgan/train_melgan.py", line 505, in main() File "examples/melgan/train_melgan.py", line 477, in main discriminator.load_weights("./examples/melgan/checkpoints/discriminator-1500000.h5") File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 2211, in load_weights hdf5_format.load_weights_from_hdf5_group(f, self.layers) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 708, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper return target(*args, **kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3576, in batch_set_value x.assign(np.asarray(value, dtype=dtype(x))) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 858, in assign self._shape.assert_is_compatible_with(value_tensor.shape) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 1134, in assert_is_compatible_with raise ValueError("Shapes %s and %s are incompatible" % (self, other)) ValueError: Shapes (4, 4, 64) and (41, 4, 64) are incompatible

ZDisket commented 4 years ago

Have you checked that the config of the MelGAN model you're trying to train matches that of the one you're fine-tuning from?

adfost commented 4 years ago

Well, I don't know where the 41 is coming from. I think the configuration is right, but I will check again.

dathudeptrai commented 4 years ago

@adfost can u share a command line :))). The best way you can do to help us is that provide us the code to reproduce the bug, you can fork or make colab -_-.

adfost commented 4 years ago

CUDA_VISIBLE_DEVICES=0 python examples/melgan/train_melgan.py --train-dir ./dump2/train/ --dev-dir ./dump2/valid/ --outdir ./examples/melgan/exp/train.melgan.v1/ --config ./examples/melgan/conf/melgan.v1.yaml --use-norm 1 --generator_mixed_precision 0 --resume ""

where train_melgan is modified to load a pretrained generator and discriminator.

adfost commented 4 years ago

I used a subset of this dataset: https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/, the male en-US voice.

adfost commented 4 years ago

And went through the steps to preprocess (it is intentionally in the same format) as recommended.

dathudeptrai commented 4 years ago

@adfost can you check if everything is fine if you training from scratch without loading pretrained?. And also, please check the error come from generator or discriminator :)).

adfost commented 4 years ago

The error comes from the discriminator

adfost commented 4 years ago

If I train from scratch, how many epochs will I need to see any improvement?

dathudeptrai commented 4 years ago

@adfost i will try to figure out the bug. But if you want fine-tune, you just need load generator, the discriminator should be train from scratch, so you won't face this error anymore :)). I recommend you training mb-melgan first. You should training atleast 1M steps for melgan model to get good performance. Around 100k, a generate audio can understandable. ^^.

adfost commented 4 years ago

Thank you I guess I will let it train longer. I only let it fine tune for about 1000 epochs and got practically static.

dathudeptrai commented 4 years ago

@adfost 1000 epochs ?, how many steps did you trained ?

adfost commented 4 years ago

I meant steps, misspoke

dathudeptrai commented 4 years ago

@adfost you should training melgan around 1M steps :))).

adfost commented 4 years ago

Well I did >35000 items of FastSpeech training and >100000 iters of mel training no sound of any voice.

dathudeptrai commented 4 years ago

@adfost so you did something wrong in preprocessing step. In this case, it is very hard to help since i do not know what are you dọig before training. Btw, where is ur duration file come from ?

adfost commented 4 years ago

I used the tacotron2 duration script to get durations. Also, I used tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml tensorflow-tts-compute-statistics --rootdir ./dump/train/ --outdir ./dump --config preprocess/ljspeech_preprocess.yaml tensorflow-tts-normalize --rootdir ./dump --outdir ./dump --stats ./dump/stats.npy --config preprocess/ljspeech_preprocess.yaml

to preprocess, as suggested. Maybe you could check the format of https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/, as I have the suspicion that it is a formatting issue.

dathudeptrai commented 4 years ago

@adfost did you re train tacotron ?

adfost commented 4 years ago

No

dathudeptrai commented 4 years ago

@adfost that is a problem, pretrained ljspeech tacotron-2 can't use for other dataset to extract duration.

adfost commented 4 years ago

Is there another way to extract duration?

adfost commented 4 years ago

Also just wondering about dataloader. Training tacotron2 now

dathudeptrai commented 4 years ago

Is there another way to extract duration?

you can using MFA to extract duration

adfost commented 4 years ago

What is that?

dathudeptrai commented 4 years ago

@adfost see this PR https://github.com/TensorSpeech/TensorflowTTS/pull/147 or https://github.com/TensorSpeech/TensorflowTTS/pull/122 not yet merge

adfost commented 4 years ago

I'm a bit confused about that, I will try using the model I am training first. Also, I'm a bit confused by the Generator function in First, you need define data loader based on AbstractDataset class (see abstract_dataset.py). On this example, a dataloader read dataset from path. I use suffix to classify what file is a charactor, duration and mel-spectrogram (see fastspeech_dataset.py). If you already have preprocessed version of your target dataset, you don't need to use this example dataloader, you just need refer my dataloader and modify generator function to adapt with your case. Normally, a generator function should return [charactor_ids, duration, mel]. Pls see tacotron2-example to know how to extract durations Extract Duration

dathudeptrai commented 4 years ago

@adfost there are many way to extract durarion now. We can use tacotron or MFA tool to extract duration. The MFA not yet merge to master. I am trying to make prprocessing stage abtracter.

adfost commented 4 years ago

I think I can go for 10000 iterations of tacotron for a rudimentary model, then train the melgan and FastSpeech again to see if I can get something that isn't just unintelligible static

dathudeptrai commented 4 years ago

@adfost it's good if you can make a fork of this repo and commit ur code, ur config, ... so I can easy to help.

ZDisket commented 4 years ago

@adfost Try MFA first, it's not that complicated: https://github.com/ZDisket/TensorflowTTS/tree/master/examples/fastspeech2/mfa

adfost commented 4 years ago

ok trying that

adfost commented 4 years ago

Calling tensorflow-tts-preprocess --rootdir ./datasets/ --outdir ./dump/ --conf preprocess/ljspeech_preprocess.yaml with the extra argument gives FileNotFoundError: [Errno 2] No such file or directory: './dump/train/wavs/poisoned_pen_04_f000129-wave.npy'. I thought that file wasn't supposed to exist yet.

unparalleled-ysj commented 4 years ago

@adfost i will try to figure out the bug. But if you want fine-tune, you just need load generator, the discriminator should be train from scratch, so you won't face this error anymore :)). I recommend you training mb-melgan first. You should training atleast 1M steps for melgan model to get good performance. Around 100k, a generate audio can understandable. ^^.

@dathudeptrai Hi, when I try to fine-tune mb-melgan I also I also encountered ValueError: Shapes (4, 4, 64) and (41, 4, 64) are incompatible , how to load generator only and train discriminator from scratch ? I only know --resume modeler/checkpoints/ckpt-100000

dathudeptrai commented 4 years ago

Fixed :D.