Open DavideRossi opened 3 weeks ago
Hi @DavideRossi , I had similar errors, but 8 bit quantization is working for me on ROCm now. I have added a comment with steps I took in the bitsandbytes multi-backend-refactor discussion post with more details. Hope this helps.
Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of python -m bitsandbytes
from your system?
Thanks @mohamedyassin1 what you describe is very similar to my own setup. Can I ask you to paste the output of
python -m bitsandbytes
from your system?
Sure:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='60', cuda_version_tuple=(6, 0))
PyTorch settings found: CUDA_VERSION=60, Highest Compute Capability: (11, 0).
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
SUCCESS!
Installation was successful!
That's interesting. It says highest_compute_capability=(11, 0)
whereas my output says highest_compute_capability=(9, 0)
. An NVidia hardware this fully depends on the GPU model, on ROCm I have no idea if it only depends on the hardware or also on the HIP/ROCm version...
System Info
An AMD Epyc system with 3 MI210. Quite a complex setup. The system uses slurm to schedule batch jobs which are usually in the form of apptainer run containers. The image I'm using has rocm6.0.2 on ubuntu22.04.
Reproduction
python -m bitsandbytes
Two issues here: CUDA_VERSION here is not 61, that's the ROCm version (6.1), the cuda version is the hell knows what since torch.version.cuda is None on ROCm. As a result the "lower than 11" makes little sense in this case. Second issue:
https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
leads nowhere. That leaves me wondering whether 8-bit on ROCm is really supported or not.OK, let's try to run some code then:
Result:
See #538. But now the question is: it's really the case that the existing 8-bit code is not supported on ROCm, or is it a case of architecture/libraries mismatch and 8-bit could actually work?
Expected behavior
This might be a bug, or it might not. I've not been able to find specific documentation on this. It seems to me like it's possible that 8 bit quantization could actually work but the code to detect if the architecture is supported has issues. Or it may be the case that I can forget about 8 bit on ROCm. But at least I would know it for sure.