Open CUHKSZzxy opened 6 months ago
hi, I met the same problem, in my case, my tensor variables are not in the same device, that's the problem, after I fixed the tensor variables to the same device(cpu or cuda), the problem was solved, maybe this case will help.
hi, I met the same problem, in my case, my tensor variables are not in the same device, that's the problem, after I fixed the tensor variables to the same device(cpu or cuda), the problem was solved, maybe this case will help.
Thanks for your suggestions, I will give it a try!
Thank you for your excellent work!
Currently, I am trying to reproduce KVQaunt but have encountered some errors. Your assistance with this matter would be appreciated.
1. Reproduce the bug
I followed the provided instructions and set up the environment for gradient/quant/deployment. The gradient and quantization processes performed well; I successfully computed the gradient and built the quantizer. However, when I tested the deployment code using the following instructions, I encountered the error message "CUDA error: an illegal memory access was encountered."
2. Error logs
The detailed error logs are shown as follows:
According to my understanding, it appears that the error is somehow related to CUDA kernel implementation "vecquant4appendvecKsparse," which modifies the variable "outliers_rescaled".
3. Environment
Due to hardware constraints, I intend to perform a quick test on the smaller model weights as indicated above. KVQuant is expected to work properly, as the smaller model differs from Llama-7B only in terms of weight size while sharing a similar architecture.
4、Related solutions that I have tried
As suggested in the discussion related to this CUDA error on https://github.com/pytorch/pytorch/issues/21819 , I have updated CUDA, torch, and other relevant components to the latest versions. However, I am still encountering the same error.
What's the potential problem of this error and how could I solve it?
Thanks in advance!