fishaudio / fish-diffusion

An easy to understand TTS / SVS / SVC framework
https://diff.fish.audio
MIT License
662 stars 87 forks source link

Hubert Denoiser - distorted output audio #131

Open A-2-H opened 1 month ago

A-2-H commented 1 month ago

As in the title. The denoiser script makes input audio very distorted, too loud and clipping. According to guide i used this script:

python tools/diffusion/inference.py --config configs/denoiser_cn_hubert.py \
    --checkpoint checkpoints/denoiser/denoiser-cn-hubert-large-v1.ckpt \
    --input "input.wav" \
    --output "output.wav" \
    --sampler_interval 5 \
    --skip_steps 970

I tried change --skip_steps to different value, but as I understand the lower the value the more steps it does so it changes the audio completely. When I change it to very low number like 30 it doesn't clip but quality is bad and it doesn't sound like the original. So according to the guide the number 970 should be ok as it is 30 steps only, but it's distorting the audio.

Any number between 0-1000 gives bad results in some way. The lower the number the poorer the quality becomes, the higher the number the more distorted/clipping it becomes.

I tried it on different samples rendered by different models and also tried to denoise custom audio recording and it still gives the same results.