CryVeck / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
https://arxiv.org/abs/2404.00456
Apache License 2.0
0 stars 0 forks source link

fixing token_wise rotation size when different from value size #3

Closed CryVeck closed 1 day ago

CryVeck commented 1 day ago

Using the right value of hidden size for the k projection.