PyTorch for ROCm is overwritten by PyTorch for CUDA

ghost commented 3 months ago

System Info

If I follow installation guide on README, lion-pytorch is installed (see requirements-dev.txt). However, installing lion-pytorch cause uninstallation of PyTorch for ROCm (e.g., 2.4.0.dev20240520+rocm6.1) and install PyTorch for CUDA (e.g., 2.3.1+cu121). Is there a way to avoid the overwrite?

A workaround is to install lion-pytorch first, then re-install PyTorch for ROCm manually. However, it may confuse a user. If there is no way to avoid the overwrite, I will raise the PR to add explanation to README.

Reproduction

Original PyTorch for ROCm version (e.g., 2.4.0.dev20240520+rocm6.1)
```
python -c 'import torch; print(torch.__version__)'
```

Install lion-pytorch

git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt

PyTorch is replaced by CUDA version (e.g., 2.3.1+cu121)
```
python -c 'import torch; print(torch.__version__)'
```

Expected behavior

Keep original PyTorch for ROCm if possible. If not, at least we should add note in order to call a user attention to reinstall PyTorch for ROCm.

pnunna93 commented 3 months ago

Please use same pip/python version for pytorch and lion-pytorch installation.

ghost commented 3 months ago

After being replaced by CUDA PyTorch, I need to re-install PyTorch for ROCm like this.

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

It's better to give user a heads-up on this in README. When installing this bitsandbytes for ROCm, I didn't notice my PyTorch for ROCm is replaced by PyTorch for CUDA, then my model training code was not working due to that, which wastes a lot of time for me to debug..

pnunna93 commented 3 months ago

@taka-nscc , please check that your environment doesn't have multiple python/pip versions. You can create a container from one of our pytorch dockers to be sure and install inside it. docker pull rocm/pytorch:latest docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest

ghost commented 3 months ago

This is the procedure to set up my environment.

# Pull and run the latest Docker image of PyTorch for ROCm
docker pull rocm/pytorch:latest
docker run -itd -v /home/work:/root --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host rocm/pytorch:latest

# Check PyTorch version of the container
python -c 'import torch; print(torch.__version__)' # -> 2.4.0.dev20240520+rocm6.0

# Install bitsandbytes
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .

# PyTorch for CUDA is installed after installing bitsandbytes as follows
python -c 'import torch; print(torch.__version__)' # -> 2.3.1+cu121

It would be user friendly if we could add explanation about this in README so that users can recognize they need to re-install PyTorch manually.

ROCm / bitsandbytes