agkphysics / EVC-augmentation

Emotional voice conversion for data augmentation
0 stars 0 forks source link

Pre trained Hifi GAN model throwing error while fine tuning #1

Closed mnabenas closed 1 year ago

mnabenas commented 1 year ago

Are the pre trained files for Hifi GAN correct? I was trying to fine tune them using ESD and after creating the forward outputs I tried to run the fine tune command

cp do_00152000 fine_tuned_checkpoints/
cp g_00152000 fine_tuned_checkpoints/
python train.py --fine_tuning True --input_training_file ../data/ESD/en/train.txt --input_validation_file ../data/ESD/en/valid.txt --input_mels_dir ../conversion/out_ESD_en/fwd_mels --config config_v1_16k_0.05_0.0125.json --checkpoint_path fine_tuned_checkpoints/ --checkpoint_interval 2000 --validation_interval 1000 --summary_interval 200

but it throws the following error:

Loading 'fine_tuned_checkpoints/g_00152000'
Complete.
Loading 'fine_tuned_checkpoints/do_00152000'
Complete.
Traceback (most recent call last):
  File "/root/EVC-augmentation/hifi-gan/train.py", line 272, in <module>
    main()
  File "/root/EVC-augmentation/hifi-gan/train.py", line 268, in main
    train(0, a, h)
  File "/root/EVC-augmentation/hifi-gan/train.py", line 53, in train
    mpd.load_state_dict(state_dict_do['mpd'])
KeyError: 'mpd'

After checking the files downloaded I noticed that both weigh the same, is that correct? Thanks for any help

agkphysics commented 1 year ago

I seem to have duplicated the generator for some reason. I have uploaded the correct discriminator/optimiser to the folder: https://drive.google.com/drive/u/1/folders/121e6UgV1qtKGTbdEoGmQhp4BdAQqJUrB

mnabenas commented 1 year ago

Thanks! With that is working just fine. Last question, during the conversion we use this command:

python convert_all.py \
    --checkpoint_path out_ft_IEMOCAP_en_4class/logdir/checkpoint_13651 \
    --input_list ../emotion/datasets/MSP-IMPROV/files_neutral.txt \
    --output_dir ../augmentation/datasets/MSP-IMPROV_aug/IEMOCAP_evc/hifi-gan_v1_ft_mel_vocoded/ \
    --wav \
    --hifi_gan_path ../hifi-gan/cp/v1_cv_10lang_ft_cv_10lang/g_00496000 \
    --hparams \""emo_list=[anger,happiness,neutral,sadness]"\",emo_embedding_dir=embeddings/IEMOCAP/,mel_mean_std=../data/IEMOCAP/mel_mean_std.npy,pretrain_n_speakers=1967,n_symbols=315

But in the args there is the --neutral argument. Is it which index of my list of emotions is the neutral one? I cannot find the documentation for that argument and looks like is a required one. Thanks again

agkphysics commented 1 year ago

Yes, it represents the index of the neutral emotion in the list of embeddings. However, I think that if spemb_input == False it doesn't actually matter, which is the case for the trained models linked in the README.