fishaudio / fish-diffusion

An easy to understand TTS / SVS / SVC framework
https://diff.fish.audio
MIT License
603 stars 75 forks source link

Tensorboard's Audio panel, with all gt being hoarse voices. Tensorboard的Audio面板,gt全是沙哑的声音 #88

Closed bfloat16 closed 1 year ago

bfloat16 commented 1 year ago

Configuration file used: svc_content_vec_finetune.py 使用的配置文件:svc_content_vec_finetune.py

Pretrained model used: content-vec-pretrained-v1.ckpt 使用的预训练模型:content-vec-pretrained-v1.ckpt

vocoder:Downloaded using download_nsf_hifigan.py 声码器:使用download_nsf_hifigan.py下载的

Command:python tools\diffusion\train.py --config configs\svc_content_vec_finetune.py --pretrained checkpoints\content-vec-pretrained-v1.ckpt --tensorboard

leng-yue commented 1 year ago

检查下 f0 是否正确, 或者换一个 CrepePitchExtractor 试试.

bfloat16 commented 1 year ago

已经更换成CrepePitchExtractor并重新提取,但是gt依然是沙哑的(换成hifisinger就没问题了,但是推理出来的音频,呼吸声带电) 还有f0要怎么看(不是很懂

leng-yue commented 1 year ago

你方便下载一个 gt 音频么, 我看看梅尔谱.

bfloat16 commented 1 year ago

这个是用PM提取的,不算太严重,有些连Crepe都救不回来(全哑 https://drive.google.com/file/d/1IXtXdLlj2dPi3ZiSVaIUPrsrgD9cd_7A/view?usp=share_link

leng-yue commented 1 year ago

确实是 f0 爆了, 只能都试试哩 (悲)

bfloat16 commented 1 year ago

有可能找到罪魁祸首了,44.1k的音频里面混了36k和24k(恼)

bfloat16 commented 1 year ago

哑音依旧存在,可能是数据集的质量不达标(hifisinger救命