AMD GPU on Applio powered by Zluda

AznamirWoW commented 1 month ago

1) Install pre-requisites: a) HIP SDK 6.1.2 https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html b) Zluda 3.8 https://github.com/lshqqytiger/ZLUDA/releases/tag/rel.86cdab3b14b556e95eafe370b8e8a1a80e8d093b c) if using GPU unsupported by HIP SDK, such as 6600/6700(XT) need to find/build a custom set of libraries

2) Modify run-install.bat to pull cu118 torch libraries instead of cu121 pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

3) execute run-install.bat to build local environment

4) unzip Zluda into zluda folder, patch torch libraries using copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

5) apply the patch

6) Start Applio using zluda\zluda.exe -- env\python.exe app.py

27kaive commented 1 month ago

how do u apply the patch? i get a lot of errors which say "patch does not apply"

AznamirWoW commented 1 month ago

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

tej1940 commented 1 month ago

i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

AznamirWoW commented 1 month ago

Note that 1st time Zluda gets a task to process it may take 10-20 minutes for it to compile a kernel code. During this time there's no visible output. Sit tight and wait until it it done.

fcastro97 commented 1 month ago

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

Is the version you made working?

AznamirWoW commented 1 month ago

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

Is the version you made working?

works fine with 3.2.1 and 3.2.2

blaisewf commented 1 month ago

I will leave this issue pinned so that users can share their feedback and consider an implementation in the near future.

Minksh commented 1 month ago

i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

You use these commands in cmd ;)

but first you want to use cd yourappliodirectory.

If your applio is on another drive you use cd /d cd /d d:\Applio

Osony commented 2 weeks ago

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib\" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

AznamirWoW commented 2 weeks ago

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

Osony commented 2 weeks ago

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

What about using not pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118 but pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0 ? Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that

AznamirWoW commented 2 weeks ago

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

What about using not pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118 but pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0 ? Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that

if you're using RX 7900 variety, you dont need Zluda, you can just use ROCM Pytorch.

BSIdro commented 2 weeks ago

Does this work with Windows? or do you absolutely need Linux?

AznamirWoW commented 2 weeks ago

Does this work with Windows? or do you absolutely need Linux?

It is quite opposite. Making it work with Windows is pretty easy, making it work with Linux is a huge P.I.T.A. Hopefully my PR will be included in the build soon :)

BSIdro commented 2 weeks ago

As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?

AznamirWoW commented 2 weeks ago

As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?

right now you can make the manual changes using PR as an example

https://github.com/IAHispano/Applio/pull/593/commits/2c18fc86ea4cb9aba1e2f4224dca1c88fb6ca25e

there's a readme file to follow as well

BSIdro commented 2 weeks ago

I don't understand I tried to install all this but I got this error in my cmd when I wanted to train the pth

An error occurred processing 0_99.wav: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Disabling CUDNN for traning with Zluda Process Process-1: Traceback (most recent call last): File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 315, in _bootstrap self.run() File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\ApplioV3.2.1\rvc\train\train.py", line 396, in run net_g = DDP(net_g, device_ids=[rank]) File "C:\ApplioV3.2.1\env\lib\site-packages\torch\nn\parallel\distributed.py", line 797, in init _sync_module_states( File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 292, in _sync_module_states _sync_params_and_buffers(process_group, module_states, broadcast_bucket_size, src) File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 306, in _sync_params_and_buffers dist._broadcast_coalesced( RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

EDIT: it's good, I think it works, I installed all that incorrectly, I didn't take the time to run the "run-install.bat" before applying everything else. However sometimes not much happens, I stay at 0% for several minutes and then things happen. I guess it's because this is the first use as mentioned above.

EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%

AznamirWoW commented 2 weeks ago

EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%

Ttask manager does not always report correct load. Adrenaline Control Panel may show correct values. Depending on the size of the training data set, training is 4-5x faster than on CPU.

as for Inference, it is ~10x faster than CPU.

BSIdro commented 2 weeks ago

Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?

4 minutes per epoch currently. is this normal?

blaisewf commented 2 weeks ago

not the best speed but considering your dataset it’s okay

AznamirWoW commented 2 weeks ago

Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?

4 minutes per epoch currently. is this normal?

with 23min in sliced_audios 10 epochs take ~30 minutes on my 6700XT with 42 min in sliced_audios 10 epochs take just under an hour.

Depending on the quality of data / using custom pretrained you dont need that many epochs anyway. In some cases I had good results with just 20.

BSIdro commented 2 weeks ago

Indeed, I did a test on 30 epochs, the model was good enough! Thanks a lot for the help! I can now use my AMD GPU to train models :)

blaisewf commented 2 weeks ago

merged

IAHispano / Applio

AMD GPU on Applio powered by Zluda #503