IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Apache License 2.0
562 stars 45 forks source link

groupsize=64 is not supported #17

Open jameswu2014 opened 6 months ago

jameswu2014 commented 6 months ago

Hello, Marlin is a great job! However, in my use case, I found that it still has some limitations. Specifically, when the group size of GPTQ is set to 64, the model performs very well; when set to 128, the performance will decrease. However, Marlin currently does not support setting the group size to 64. Therefore, I would like to ask, how can I modify the source code to make Marlin support this setting?

zhink commented 1 month ago

do you already support the case?