AssertionError - Githubissues

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

277 stars 24 forks source link

Hi @muzi0111,

Thanks for your interest in our project.

About the assertion error, I'm assuming you are referring to L235 in quant.py. The quantization method applied to KV-Cache is group quant with the granularity of per head. Therefore, this assertion is to ensure the last dimension (which will be the reduction dimension in group quant) has the shape of _headdim. 128 here is widely used _headdim in newly released models for efficiency consideration.

To resolve this error, I think replacing 128 with head_dim will be a good choice.

efeslab / Atom

AssertionError #9