FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
4.72k stars 475 forks source link

调整输入音频和输出音频的采样率对输出结果有什么影响? #165

Open LayBrick opened 1 month ago

LayBrick commented 1 month ago

题主prompt的采样频率为32k,题主尝试将输入采样频率调为32k后,输出声音非常模糊,像是方言 题主将输出采样频率调为44k后,输出结果音频速度非常快 请问应如何设定最佳输入输出采样频率?

aluminumbox commented 1 month ago

well the audio is loaded as 16k wav automatically in the code. check https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/utils/file_utils.py#L40 do not change output sample rate because it is fixed as 22050hz

LayBrick commented 1 month ago

thanks,understood!

HsiangLeekwok commented 1 month ago

@LayBrick

thanks,understood!

you can use ffmpeg command to change the wav's sample rate(do not change the train/test set), as you like:

ffmpeg -i src.wav -ar 44100 -b:a 768k -ac 2 -y out.wav

-ar for the sample rate,采样率 -b:a for the bitrate,比特率 -ac for the channel,1=单声道,2=双声道