Hello, I test the inference speed of longchat-13b-16k.
On the longeval topic task, input 9600 token length, output 12 tokens, it takes 23s.
Then on the LongBench, input 7367 token length and output 200 tokens, it takes 8 minutes.
Is this speed normal?
How did you speed up the inference of long context?
Hello, I test the inference speed of longchat-13b-16k. On the longeval topic task, input 9600 token length, output 12 tokens, it takes 23s. Then on the LongBench, input 7367 token length and output 200 tokens, it takes 8 minutes. Is this speed normal? How did you speed up the inference of long context?