IST-DASLab / QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
Apache License 2.0
167 stars 12 forks source link

[Question] Does QUIK support muiti-batch inference? #11

Open hanrui1sensetime opened 9 months ago

hanrui1sensetime commented 9 months ago

Seems that MixedQLinear does not support multi batch inference now. So, could this way extended to multi-batch version? And will you have plan to do this or not?