Closed sihouzi21c closed 3 weeks ago
@sihouzi21c Try with group size as 128.
This is not caused by OOM, but by some CUDA kernel error.
I only test the quantization kernel on A100, so there may be some problem with 4090.
Thanks for your reply and suggestion! I will try it later. : )
hi, thanks for your great work! As reported in your paper, memory requirement for Llama-7B with 4 bits is 7GB. However, on a single 4090, when I run
bash examples/e2e_qp/Llama-2-7b/w4g-1-redpajama.sh
where "g-1" denotes group_size=-1, I encountered "Runtime Error: CUDA error: an illegal memory access was encountered". I searched for solutions and found that this indicates CUDA out of memory. Is there any suggestion to fix this problem? Thanks a lot!