Open LukeLIN-web opened 4 days ago
Hi @LukeLIN-web, I was not able to reproduce this on an RTX 4090. That said, I would also expect it to work on a 2080 Ti, as that GPU is fully supported for 4bit quantization with bitsandbytes.
I suspect your stack trace is not giving the full picture, as we do not use cublasGemmEx
in 4bit. This may come from a PyTorch operation. You may get a more clear trace by setting CUDA_LAUNCH_BLOCKING=1
in your environment.
System Info
I am using cuda_12.2, torch 2.1.0a0+29c30b1, bitsandbytes 0.43.3, python 3.10 Driver Version: 535.113.01 NVIDIA GeForce RTX 2080 Ti
Reproduction
https://github.com/Vchitect/Latte/issues/125#issue-2529714919
Expected behavior
https://huggingface.co/docs/bitsandbytes/v0.43.3/installation What is 4bit quantation GPU requirement?