Open shanhaidexiamo opened 2 months ago
are you using gpu or cpu? maybe cpu do not support fp16 speedup
are you using gpu or cpu? maybe cpu do not support fp16 speedup
All inference stages are based on GPU, and I directly used this code to run it on A10
hi , thanks for your repo. when I try to accelerate inference, I use llm.half,and change the input to int32 or float16, and the type of lm_imput is also float16, but the inference speed of LLM has not improved. Do you have any suggestions?