调整输入音频和输出音频的采样率对输出结果有什么影响？

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

4.72k stars 475 forks source link

Open LayBrick opened 1 month ago

LayBrick commented 1 month ago

题主prompt的采样频率为32k，题主尝试将输入采样频率调为32k后，输出声音非常模糊，像是方言题主将输出采样频率调为44k后，输出结果音频速度非常快请问应如何设定最佳输入输出采样频率？

aluminumbox commented 1 month ago

well the audio is loaded as 16k wav automatically in the code. check https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/utils/file_utils.py#L40 do not change output sample rate because it is fixed as 22050hz

LayBrick commented 1 month ago

thanks,understood!

HsiangLeekwok commented 1 month ago

@LayBrick

thanks,understood!

you can use ffmpeg command to change the wav's sample rate(do not change the train/test set), as you like:

ffmpeg -i src.wav -ar 44100 -b:a 768k -ac 2 -y out.wav

-ar for the sample rate，采样率 -b:a for the bitrate，比特率 -ac for the channel，1=单声道，2=双声道