Open cmingxu opened 7 months ago
me too
There is my inference server project: https://github.com/synw/goinfer
@synw Sorry for off-topic question, but you might have some experience:
I'm testing around with LLama2 models, and I found that it's extremely slow, especially if there is a bit of context in the prompt. At the beginning it was at full workload (CPU wise), now its around 10-15% and the prediction takes like 30 mins. I assume it looks different to you, right?
I am wondering is there any project already use this project?