Open ghost opened 3 months ago
Please use same pip/python version for pytorch and lion-pytorch installation.
After being replaced by CUDA PyTorch, I need to re-install PyTorch for ROCm like this.
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/
It's better to give user a heads-up on this in README. When installing this bitsandbytes for ROCm, I didn't notice my PyTorch for ROCm is replaced by PyTorch for CUDA, then my model training code was not working due to that, which wastes a lot of time for me to debug..
@taka-nscc , please check that your environment doesn't have multiple python/pip versions. You can create a container from one of our pytorch dockers to be sure and install inside it. docker pull rocm/pytorch:latest docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest
This is the procedure to set up my environment.
# Pull and run the latest Docker image of PyTorch for ROCm
docker pull rocm/pytorch:latest
docker run -itd -v /home/work:/root --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host rocm/pytorch:latest
# Check PyTorch version of the container
python -c 'import torch; print(torch.__version__)' # -> 2.4.0.dev20240520+rocm6.0
# Install bitsandbytes
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .
# PyTorch for CUDA is installed after installing bitsandbytes as follows
python -c 'import torch; print(torch.__version__)' # -> 2.3.1+cu121
It would be user friendly if we could add explanation about this in README so that users can recognize they need to re-install PyTorch manually.
System Info
If I follow installation guide on README, lion-pytorch is installed (see
requirements-dev.txt
). However, installing lion-pytorch cause uninstallation of PyTorch for ROCm (e.g., 2.4.0.dev20240520+rocm6.1) and install PyTorch for CUDA (e.g., 2.3.1+cu121). Is there a way to avoid the overwrite?A workaround is to install lion-pytorch first, then re-install PyTorch for ROCm manually. However, it may confuse a user. If there is no way to avoid the overwrite, I will raise the PR to add explanation to README.
Reproduction
Original PyTorch for ROCm version (e.g., 2.4.0.dev20240520+rocm6.1)
Install lion-pytorch
PyTorch is replaced by CUDA version (e.g., 2.3.1+cu121)
Expected behavior
Keep original PyTorch for ROCm if possible. If not, at least we should add note in order to call a user attention to reinstall PyTorch for ROCm.