feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding
412 stars 46 forks source link

does it support batchsize > 1 #24

Open GuoYi0 opened 8 months ago

GuoYi0 commented 8 months ago

does it support batchsize > 1 ?

feifeibear commented 7 months ago

No, it doesn't. CodeLlama met the same problem. I think it is an open question for the community.

haiduo commented 3 months ago

May be it supports batchsize>1:https://github.com/lucidrains/speculative-decoding