OpenGVLab / EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
224 stars 17 forks source link

Is it possible to run e2e-qp process on a single 4090? #23

Closed sihouzi21c closed 3 weeks ago

sihouzi21c commented 3 weeks ago

hi, thanks for your great work! As reported in your paper, memory requirement for Llama-7B with 4 bits is 7GB. However, on a single 4090, when I run bash examples/e2e_qp/Llama-2-7b/w4g-1-redpajama.sh where "g-1" denotes group_size=-1, I encountered "Runtime Error: CUDA error: an illegal memory access was encountered". I searched for solutions and found that this indicates CUDA out of memory. Is there any suggestion to fix this problem? Thanks a lot!

截屏2024-10-21 15 10 54
ChenMnZ commented 3 weeks ago

@sihouzi21c Try with group size as 128.

This is not caused by OOM, but by some CUDA kernel error.

I only test the quantization kernel on A100, so there may be some problem with 4090.

sihouzi21c commented 3 weeks ago

Thanks for your reply and suggestion! I will try it later. : )