broncotc / bitsandbytes-rocm

MIT License
37 stars 14 forks source link

Makefile:10: WARNING: CUDA_VERSION not set #5

Open Jaohni opened 1 year ago

Jaohni commented 1 year ago

Apologies for bugging you with what is likely a silly error on my part, but hopefully something here will be useful to people doing similar things going forward:

Issue: after using CD to get to the bitsandbytes-rocm main directory and using the command "make hip", I got the error (after redacting my personal $path information for brevity and readability):

Makefile:10: WARNING: CUDA_VERSION not set. Call make with CUDA string, for example: make cuda11x CUDA_VERSION=115 or make cpuonly CUDA_VERSION=CPU /usr/bin/hipcc -std=c++14 -c -fPIC --amdgpu-target=gfx1030 -I bitsandbytes-rocm/csrc -I bitsandbytes-rocm/include -o bitsandbytes-rocm/build/ops.o -D NO_CUBLASLT bitsandbytes-rocm/csrc/ops.cu make: /usr/bin/hipcc: No such file or directory make: *** [Makefile:107: hip] Error 127

To clarify, that was a raw "make" command, with no arguments passed. I did not adjust the Makefile, as I wasn't sure if I should specify a CUDA version to emulate, although I did attempt to use GFX 1030 as the "CUDA version" in the Makefile, though after it was unsuccessful I reverted the change. I'm not terribly familiar with makefiles so I may have had a syntax error.

System Hardware: RX 6700XT (GFX 1030) Ryzen 9 5900X 32 GB RAM Relevant files loaded onto a PCIe gen 3.0 SSD

System information: Garuda Linux (Arch Linux derivative), kernel: 6.2.2-zen1-1-zen (64-bit) Python: 3.10.9 Error occurred in a venv with ROCm dependencies manually installed via pip (notably torch, and torchvision rocm variants, version 5.2), and I attempted to install bitsandbytes-rocm, after I used command accelerate to run a machine learning related workload. In addition, to deal with an issue related to GFX_Version, I ran the below command to manually set ROCm software into a functional state in this venv

export HSA_OVERRIDE_GFX_VERSION=10.3.0

Other notable information: After installing the AUR provided packages related to ROCm outside of this venv, my GPU is listed as gfx1031in a fresh terminal. I attempted to build this just from the venv, and installed the official AUR packages after that failed, and ran into the same issue. I wrote some simple pytorch code and confirmed that ROCm is functioning as intended.

Any assistance in this matter would be much appreciated, and I hope to create an ample data trail to help future users avoid the same issues. Thanks for your time.