Tensorboard's Audio panel, with all gt being hoarse voices. Tensorboard的Audio面板，gt全是沙哑的声音

fishaudio / fish-diffusion

An easy to understand TTS / SVS / SVC framework

https://diff.fish.audio

MIT License

603 stars 75 forks source link

Tensorboard's Audio panel, with all gt being hoarse voices. Tensorboard的Audio面板，gt全是沙哑的声音 #88

Closed bfloat16 closed 1 year ago

bfloat16 commented 1 year ago

Configuration file used: svc_content_vec_finetune.py 使用的配置文件：svc_content_vec_finetune.py

Pretrained model used: content-vec-pretrained-v1.ckpt 使用的预训练模型：content-vec-pretrained-v1.ckpt

vocoder：Downloaded using download_nsf_hifigan.py 声码器：使用download_nsf_hifigan.py下载的

Command：python tools\diffusion\train.py --config configs\svc_content_vec_finetune.py --pretrained checkpoints\content-vec-pretrained-v1.ckpt --tensorboard

leng-yue commented 1 year ago

检查下 f0 是否正确, 或者换一个 CrepePitchExtractor 试试.

bfloat16 commented 1 year ago

已经更换成CrepePitchExtractor并重新提取，但是gt依然是沙哑的（换成hifisinger就没问题了，但是推理出来的音频，呼吸声带电）还有f0要怎么看（不是很懂

leng-yue commented 1 year ago

你方便下载一个 gt 音频么, 我看看梅尔谱.

bfloat16 commented 1 year ago

这个是用PM提取的，不算太严重，有些连Crepe都救不回来（全哑 https://drive.google.com/file/d/1IXtXdLlj2dPi3ZiSVaIUPrsrgD9cd_7A/view?usp=share_link

leng-yue commented 1 year ago

确实是 f0 爆了, 只能都试试哩 (悲)

bfloat16 commented 1 year ago

~~有可能找到罪魁祸首了，44.1k的音频里面混了36k和24k（恼）~~

bfloat16 commented 1 year ago

哑音依旧存在，可能是数据集的质量不达标（hifisinger救命