FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Why the variable bls must be less than 20? #132

Open LHQUer opened 7 months ago

LHQUer commented 7 months ago

It is written in the report that"Typically, gbs is a multiple of 4, and bls is less than 20 so there are not too many choices."Could you give the reason how to determine the limitation“bls<20”?