About accelerate inference

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

5.16k stars 529 forks source link

About accelerate inference #147

Open shanhaidexiamo opened 2 months ago

shanhaidexiamo commented 2 months ago

hi , thanks for your repo. when I try to accelerate inference, I use llm.half，and change the input to int32 or float16, and the type of lm_imput is also float16, but the inference speed of LLM has not improved. Do you have any suggestions?

aluminumbox commented 2 months ago

are you using gpu or cpu? maybe cpu do not support fp16 speedup

shanhaidexiamo commented 2 months ago

are you using gpu or cpu? maybe cpu do not support fp16 speedup

All inference stages are based on GPU, and I directly used this code to run it on A10