ROCm / bitsandbytes

8-bit CUDA functions for PyTorch
MIT License
34 stars 3 forks source link

PyTorch for ROCm is overwritten by PyTorch for CUDA #37

Open ghost opened 3 months ago

ghost commented 3 months ago

System Info

If I follow installation guide on README, lion-pytorch is installed (see requirements-dev.txt). However, installing lion-pytorch cause uninstallation of PyTorch for ROCm (e.g., 2.4.0.dev20240520+rocm6.1) and install PyTorch for CUDA (e.g., 2.3.1+cu121). Is there a way to avoid the overwrite?

A workaround is to install lion-pytorch first, then re-install PyTorch for ROCm manually. However, it may confuse a user. If there is no way to avoid the overwrite, I will raise the PR to add explanation to README.

Reproduction

Expected behavior

Keep original PyTorch for ROCm if possible. If not, at least we should add note in order to call a user attention to reinstall PyTorch for ROCm.

pnunna93 commented 3 months ago

Please use same pip/python version for pytorch and lion-pytorch installation.

ghost commented 3 months ago

After being replaced by CUDA PyTorch, I need to re-install PyTorch for ROCm like this.

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

It's better to give user a heads-up on this in README. When installing this bitsandbytes for ROCm, I didn't notice my PyTorch for ROCm is replaced by PyTorch for CUDA, then my model training code was not working due to that, which wastes a lot of time for me to debug..

pnunna93 commented 3 months ago

@taka-nscc , please check that your environment doesn't have multiple python/pip versions. You can create a container from one of our pytorch dockers to be sure and install inside it. docker pull rocm/pytorch:latest docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest

ghost commented 3 months ago

This is the procedure to set up my environment.

# Pull and run the latest Docker image of PyTorch for ROCm
docker pull rocm/pytorch:latest
docker run -itd -v /home/work:/root --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host rocm/pytorch:latest

# Check PyTorch version of the container
python -c 'import torch; print(torch.__version__)' # -> 2.4.0.dev20240520+rocm6.0

# Install bitsandbytes
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .

# PyTorch for CUDA is installed after installing bitsandbytes as follows
python -c 'import torch; print(torch.__version__)' # -> 2.3.1+cu121

It would be user friendly if we could add explanation about this in README so that users can recognize they need to re-install PyTorch manually.