Inference is very slow on long text input

DachengLi1 / LongChat

Official repository for LongChat and LongEval

Apache License 2.0

504 stars 29 forks source link

Inference is very slow on long text input #39

Open Colafei0406 opened 1 year ago

Colafei0406 commented 1 year ago

Hello, I test the inference speed of longchat-13b-16k. On the longeval topic task, input 9600 token length, output 12 tokens, it takes 23s. Then on the LongBench, input 7367 token length and output 200 tokens, it takes 8 minutes. Is this speed normal? How did you speed up the inference of long context?

jmzeng commented 11 months ago

Hi, @Colafei0406 I also have this issue. It actually OOMs with 32K. Did you end up fixing it?