Closed AznamirWoW closed 2 weeks ago
how do u apply the patch? i get a lot of errors which say "patch does not apply"
how do u apply the patch? i get a lot of errors which say "patch does not apply"
I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.
i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y
Note that 1st time Zluda gets a task to process it may take 10-20 minutes for it to compile a kernel code. During this time there's no visible output. Sit tight and wait until it it done.
how do u apply the patch? i get a lot of errors which say "patch does not apply"
I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.
Is the version you made working?
how do u apply the patch? i get a lot of errors which say "patch does not apply"
I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.
Is the version you made working?
works fine with 3.2.1 and 3.2.2
I will leave this issue pinned so that users can share their feedback and consider an implementation in the near future.
i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y
You use these commands in cmd ;)
but first you want to use cd yourappliodirectory
.
If your applio is on another drive you use cd /d
cd /d d:\Applio
All of this is written for Windows, for Linux is different, but im not testing copying libs and launching
HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro.
Modify run-install.sh, execute run-install.sh
Starting Applio (assuming you inside Applio folder and here is zluda folder) is:
LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py
But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib\" exist
I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)
All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is:
LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py
But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)
There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch
All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is:
LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py
But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch
What about using not
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
but
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0
?
Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that
All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is:
LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py
But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch
What about using not
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
butpip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0
? Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that
if you're using RX 7900 variety, you dont need Zluda, you can just use ROCM Pytorch.
Does this work with Windows? or do you absolutely need Linux?
Does this work with Windows? or do you absolutely need Linux?
It is quite opposite. Making it work with Windows is pretty easy, making it work with Linux is a huge P.I.T.A. Hopefully my PR will be included in the build soon :)
As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?
As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?
right now you can make the manual changes using PR as an example
https://github.com/IAHispano/Applio/pull/593/commits/2c18fc86ea4cb9aba1e2f4224dca1c88fb6ca25e
there's a readme file to follow as well
I don't understand I tried to install all this but I got this error in my cmd when I wanted to train the pth
An error occurred processing 0_99.wav: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Disabling CUDNN for traning with Zluda
Process Process-1:
Traceback (most recent call last):
File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\ApplioV3.2.1\rvc\train\train.py", line 396, in run
net_g = DDP(net_g, device_ids=[rank])
File "C:\ApplioV3.2.1\env\lib\site-packages\torch\nn\parallel\distributed.py", line 797, in init
_sync_module_states(
File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 292, in _sync_module_states
_sync_params_and_buffers(process_group, module_states, broadcast_bucket_size, src)
File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 306, in _sync_params_and_buffers
dist._broadcast_coalesced(
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
EDIT: it's good, I think it works, I installed all that incorrectly, I didn't take the time to run the "run-install.bat" before applying everything else. However sometimes not much happens, I stay at 0% for several minutes and then things happen. I guess it's because this is the first use as mentioned above.
EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%
EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%
Ttask manager does not always report correct load. Adrenaline Control Panel may show correct values. Depending on the size of the training data set, training is 4-5x faster than on CPU.
as for Inference, it is ~10x faster than CPU.
Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?
4 minutes per epoch currently. is this normal?
not the best speed but considering your dataset it’s okay
Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?
4 minutes per epoch currently. is this normal?
with 23min in sliced_audios 10 epochs take ~30 minutes on my 6700XT with 42 min in sliced_audios 10 epochs take just under an hour.
Depending on the quality of data / using custom pretrained you dont need that many epochs anyway. In some cases I had good results with just 20.
Indeed, I did a test on 30 epochs, the model was good enough! Thanks a lot for the help! I can now use my AMD GPU to train models :)
merged
1) Install pre-requisites: a) HIP SDK 6.1.2 https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html b) Zluda 3.8 https://github.com/lshqqytiger/ZLUDA/releases/tag/rel.86cdab3b14b556e95eafe370b8e8a1a80e8d093b c) if using GPU unsupported by HIP SDK, such as 6600/6700(XT) need to find/build a custom set of libraries
2) Modify run-install.bat to pull cu118 torch libraries instead of cu121 pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
3) execute run-install.bat to build local environment
4) unzip Zluda into zluda folder, patch torch libraries using copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y
5) apply the patch
6) Start Applio using zluda\zluda.exe -- env\python.exe app.py