Closed cokeshao closed 1 month ago
Hi @cokeshao ,
Thanks for your interest. Here we utilize PyTorch API to initialize the correct space for the INT4 tensor. Since there was no direct API for INT4 at that time, uint8 dtype was used to allocate CUDA memory. Therefore, (hidden_dim - group_size) of INT4 equals to (hidden_dim - group_size) // 2 of INT8.
Thank you for getting back to me so quickly. I'm overcomplicating the issue.
Thanks for your great work! I have a small question here.
Why the matrix dimension is (bs, (hidden_dim - group_size) // 2)) not (bs, hidden_dim - group_size)) here? What does this "//2" mean? Is it some kind of hardware acceleration method? Can you elaborate it? Thank you. https://github.com/efeslab/Atom/blob/7e3618b1a7a7c86e1c93cc909b1510c046d76ac6/kernels/baselines/python-api.ipynb#L285-L292