Open bennmann opened 1 year ago
bitsandbytes-rocm also is very challenging to get up and running for 8bit on regular transformers (in steps following after the final steps of this guide)
it may be hardcoded for 5.3 rocm at the time of this writing, this means this guide may be incompatible with bitsandbytes-rocm (the github of this project is not an official AMD one and i won't link it here for that reason, easy to find though)
also, a lot of these issues may be resolved by purchasing AMD Machine Learning silicon (such as MI210) instead of consumer cards, but where's the fun in that (also ain't nobody got that kind of money)
How about wsl2?
Wsl 2 does not support AMD ROCM yet at the time of this guide. Please use dual boot methods, or consider switching entirely to Linux (if you can get Proton working for gaming, etc).
I would be happy to learn otherwise whenever this changes.
On Sat, Mar 18, 2023, 1:12 AM CatUnderCar @.***> wrote:
How about wsl2?
— Reply to this email directly, view it on GitHub https://github.com/BlinkDL/ChatRWKV/issues/15#issuecomment-1474716036, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMUTTV33GTVQWC7RXT7VILW4U72LANCNFSM6AAAAAAVCIQKS4 . You are receiving this because you authored the thread.Message ID: @.***>
How is the performance?
People has reported that it's around RTX-3070. I am still interested in the performance, especially when using RWKV.
Performance is ok, 5 tokens per second output for 6900Xt with the largest RWKV 14B model - Nvidia can do better with their tensor cores but 30X0 has ram limitations (except 3090)
There are reports of 3090 getting more than 20 tokens per second RWKV version 0.6.0 using specific settings
However AMD 7X00 should be better with AI gemm functionality, I expect 20 tokens per second whenever ROCM support 7900 series on 14B model with RWKV 0.6.0 version (best guess)
Untested on AMD 7X00 grf11 series
On Sat, Mar 18, 2023, 5:24 PM Tim Wu @.***> wrote:
How is the performance?
People has reported that it's around RTX-3070 https://www.youtube.com/watch?v=HwGgzaz7ipQ. I am still interested in the performance, especially when using RWKV.
— Reply to this email directly, view it on GitHub https://github.com/BlinkDL/ChatRWKV/issues/15#issuecomment-1474997615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMUTTRKQZMMEU7GRJHTGBDW4YRZZANCNFSM6AAAAAAVCIQKS4 . You are receiving this because you authored the thread.Message ID: @.***>
Wsl 2 does not support AMD ROCM yet at the time of this guide. Please use dual boot methods, or consider switching entirely to Linux (if you can get Proton working for gaming, etc). I would be happy to learn otherwise whenever this changes. … On Sat, Mar 18, 2023, 1:12 AM CatUnderCar @.> wrote: How about wsl2? — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMUTTV33GTVQWC7RXT7VILW4U72LANCNFSM6AAAAAAVCIQKS4 . You are receiving this because you authored the thread.Message ID: @.>
Thanks
Update August 2023:
I kind of dislike containers and usually prefer pure metal, but the below method should work (untested):
Install AMD drivers (may need to chown _apt amd_driver.deb to install), install vim or text editor of choice, enable Universe repositories (basic OS setup) sudo apt-get update sudo apt-get upgrade
go through the docker website installation steps (there are more than a few and must be followed perfectly)
Once you have docker: docker pull rocm/pytorch-nightly sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch-nightly
In the running image: cd /home export HSA_OVERRIDE_GFX_VERSION=10.3.0
git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6.git bitsandbytes cd bitsandbytes make hip ROCM_TARGET=gfx1030 pip install pip --upgrade pip install .
cd .. pip install --upgrade git+https://github.com/BlinkDL/ChatRWKV
echo "alias python3='rocm-smi --setfan 99%;python3' #AMD fan curve was not aggressive enough for my cooling" >> ~/.bashrc cd ChatRWKV/v2 vim chat.py # edit in your model location and other parameters you care about python3 chat.py
EDIT 2023-03-22 I wiped and started over. This guide is no longer up to date for Ubuntu 22.04.1, working through workaround now - consider using a Colab which can support the 14B model in the mean time using rwkvstic package (check discord)
Your miles may vary: 1;/bin/software-properties-gtk ; echo 'turn on via checkmark all repos in the first tab in the GTK GUI for software properties' 2;sudo apt-get update 3;echo 'Download AMD linux drivers for 6900XT from their support website' 4;ll ~/Downloads/amdgpu-install_5.4.50401-1_all.deb 5;sudo chown _apt ~/Downloads/amdgpu-install_5.4.50401-1_all.deb 6;sudo apt-get install ~/Downloads/amdgpu-install_5.4.50401-1_all.deb 7;sudo chown ubuntu ~/Downloads/amdgpu-install_5.4.50401-1_all.deb 8;sudo apt-get install ~/Downloads/amdgpu-install_5.4.50401-1_all.deb 9;sudo apt-cache showpkg amdgpu-install 10;which -a amdgpu-install 11;sudo amdgpu-install --usecase=hiplibsdk,rocm,hip,dkms,hip-dev 12;sudo apt-get install perl liburi-encode-perl libfile-copy-recursive-perl libtinfo5 libncurses5 13;sudo apt-get install python3-pip 14;/bin/update-manager 15;echo 'update software in ubuntu GUI as well' 16;rocm-smi 17;echo 'the above should display GPU information' 18;export HSA_OVERRIDE_GFX_VERSION=10.3.0 ; echo 'this is important later for pytorch' 19;sudo snap refresh firefox --stable; echo 'only run if your firefox somehow breaks from the above process' 20;sudo shutdown -r now 21;echo 'restart often during this process' 22;echo 'the below 3 commands may be skipped, untested without skipping - the linux username is ubuntu but should be your username' 23;sudo usermod -a -G render ubuntu 24;sudo usermod -a -G video ubuntu 25;sudo shutdown -r now;echo 'restart often during this process' 26;mkdir /media/ubuntu/2TB_fast_nvme_Drive1/pip_cache 27;mkdir /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages 28;echo 'only need the -t and --cache-dir flags with pip3 in the next command if your boot drive is not your Machine Learning drive' 29;pip3 install --user -t /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages --cache-dir=/media/ubuntu/2TB_fast_nvme_drive1/pip_cache torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2 30;echo 'the only one that matters for Natural Language Processing in this history is torch, the others may error and that's ok for this terminal history' 31;echo 'only if your boot drive is not your Machine Learning drive do the below' 32;export PYTHONUSERBASE=/media/ubuntu/2TB_fast_nvme_drive1/pip_local_site-packages 33;export TMPDIR=/media/ubuntu/2TB_fast_nvme_drive1/pip_cache 34;export PYTHONPATH=/media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages ; echo 'only if your boot drive is not your Machine Learning drive' 35;echo 'see you on the flip side, restart' 36;sudo shutdown -r now 37;echo 'one must make sure their non-boot drives are initiated if /etc/fstab is not taking hold - opening a file explorer, navigate to your NVME if not your boot drive manually upon every reboot, if fstab does not gracefully automount the drive at every startup' 38;echo 'clean up a little, just in case' 39;sudo apt-get install --fix-broken 40;sudo apt-get upgrade 41;lspci | grep AMD 42;echo 'mine shows an entry like this "03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)" use the beginning of the lspci AMD string output to check a folder' 43;sudo ls /sys/bus/pci/devices/03 44;echo 'the output of the below should be 0, not -1' 45;sudo cat /sys/bus/pci/devices/03/numa_node 46;sudo echo -1 | tee -a "/sys/bus/pci/devices/0000:03:00.0/numa_node" 47;pip3 install -t /media/ubuntu/2TB_fast_nvme_Drive1/pip_local_site-packages --cache-dir=/media/ubuntu/2TB_fast_nvme_Drive1/pip_cache --upgrade transformers accelerate bitsandbytes-rocm --extra-index-url https://download.pytorch.org/whl/rocm5.2 48;echo 'see you on the flip side, restart' 49;sudo shutdown -r now 50;export LD_LIBRARY_PATH=/opt/rocm-5.4.1/lib export LD_LIBRARY_PATH=/opt/rocm-5.4.3/lib:/opt/rocm-5.4.3/lib64 export PATH=$PATH:/opt/rocm-5.4.3/bin:/opt/rocm-5.4.3/opencl/bin export LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/lib64 export PATH=$PATH:/opt/rocm/bin:/opt/rocm/opencl/bin 51;echo 'begin python3 torch and inference tests' 52;echo "alias python3='rocm-smi --setfan 99%;python3' #AMD fan curve was not aggressive enough for my cooling" >> ~/.bashrc