BakerBunker / FreeV

[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
https://bakerbunker.github.io/FreeV/
MIT License
73 stars 6 forks source link

Finetune Only? #2

Open colstone opened 3 months ago

colstone commented 3 months ago

hi,我在尝试训练freev的时候发现了一个问题,当我尝试不使用预训练模型来训练一个采样率为44.1khz的模型时,代码打印完g的结构之后不进行训练。请问目前代码是只能进行微调吗?如果可以的话,我该修改哪些部分以便开始训练而不是微调? config文件如下:

{
    "input_training_wav_list": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/train",
    "input_validation_wav_list": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/val",
    "test_input_wavs_dir":"/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/test",
    "test_input_mels_dir":"./",
    "test_mel_load": 0,
    "test_output_dir": "/public/home/acd6i9tg6y/fish-diffusion/vocoder_training_data/test_out",

    "batch_size": 16,
    "learning_rate": 0.0002,
    "adam_b1": 0.8,
    "adam_b2": 0.99,
    "lr_decay": 0.999,
    "seed": 114514,
    "training_epochs": -1,
    "stdout_interval":20,
    "checkpoint_interval": 1000,
    "summary_interval": 100,
    "validation_interval": 1000,
    "checkpoint_path": "./ckpt/20240627-freev-44100",
    "checkpoint_file_load": "",

    "ASP_channel": 513,
    "ASP_resblock_kernel_sizes": [3,7,11],
    "ASP_resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "ASP_input_conv_kernel_size": 7,
    "ASP_output_conv_kernel_size": 7,

    "PSP_channel": 512,
    "PSP_resblock_kernel_sizes": [3,7,11],
    "PSP_resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]], 
    "PSP_input_conv_kernel_size": 7,
    "PSP_output_R_conv_kernel_size": 7,
    "PSP_output_I_conv_kernel_size": 7,

    "segment_size": 16384,
    "num_mels": 128,
    "n_fft": 2048,
    "hop_size": 512,
    "win_size": 2048,

    "sampling_rate": 44100,

    "fmin": 40,
    "fmax": 16000,
    "meloss":null,
    "num_workers": 4
}

json文件肯定有一些地方是错误的,还望海涵

BakerBunker commented 3 months ago

应该是training epochs这里,如果是-1的话会立刻结束循环

BakerBunker commented 3 months ago

如果训练有结果的话,可以了解一下训练结果吗😂我也挺好奇这个方法在歌声上会不会有比speech更大的提升,个人感觉如果没有更改f0的需求的话,伪逆幅度谱的condition比f0是更强的

colstone commented 3 months ago

如果训练有结果的话,可以了解一下训练结果吗😂我也挺好奇这个方法在歌声上会不会有比speech更大的提升,个人感觉如果没有更改f0的需求的话,伪逆幅度谱的condition比f0是更强的

好的,不过要是应用到目前的歌声合成的话,确实还得需要f0_emb。后续练完我把权重公开一下))