Closed anastasiuspernat closed 3 months ago
I was able to fix it by installing more recent versions of torch+cuda. But the question remains - seems it doesn't use GPU at all, CPU load is 100% and GPU usage never goes beyond 1-2%. And the process takes forever - like 30 minutes (running on RTX 4090/Intel i5-11400). How can I really activate CUDA/GPU?
It uses 100% GPU on my 3090TI and just over 8 GB out of the 24GB VRAM available. Uses about 3% CPU.
It takes ~30 seconds to run the demo code with the example .wav file input used to generate The quick brown rat.
Are you using WSL ? What is the output of nvidia-smi
or nvidia-smi.exe
?
My setup:
$ uname -a
Linux archlinux 6.8.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 May 2024 17:49:46 +0000 x86_64 GNU/Linux
$ python -V
Python 3.12.3
$ pip freeze
audioread==3.0.1
black==24.4.2
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.15.4
fsspec==2024.6.0
hf_transfer==0.1.6
huggingface-hub==0.23.4
idna==3.7
isort==5.13.2
Jinja2==3.1.4
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mpmath==1.3.0
msgpack==1.0.8
mypy-extensions==1.0.0
networkx==3.3
numba==0.60.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
packaging==24.1
pathspec==0.12.1
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
setuptools==70.1.1
six==1.16.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12.1
threadpoolctl==3.5.0
torch==2.3.1
torchaudio==2.3.1
tqdm==4.66.4
typing_extensions==4.12.2
urllib3==2.2.2
vocos==0.1.0
wheel==0.43.0
I'm also seeing pure CPU usage using the quick start guide... A quick start guide for CUDA inference would be excellent if someone could share or publish one as a PR 👍
It uses 100% GPU on my 3090TI and just over 8 GB out of the 24GB VRAM available. Uses about 3% CPU.
It takes ~30 seconds to run the demo code with the example .wav file input used to generate
The quick brown rat.
Are you using WSL ? What is the output of
nvidia-smi
ornvidia-smi.exe
?My setup:
Not WSL, using Terminal. Does it support Windows at all btw? The longer phrase took about 2 hrs and never finished - I had to cancel it.
nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.99 Driver Version: 555.99 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 WDDM | 00000000:01:00.0 On | Off |
| 0% 38C P8 12W / 450W | 1132MiB / 24564MiB | 18% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
pip freeze:
audioread==3.0.1
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.13.1
fsspec==2024.2.0
huggingface-hub==0.23.4
idna==3.7
intel-openmp==2021.4.0
Jinja2==3.1.3
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mkl==2021.4.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tbb==2021.11.0
threadpoolctl==3.5.0
torch==2.3.1+cu118
torchaudio==2.3.1+cu118
torchvision==0.18.1+cu118
tqdm==4.66.4
typing_extensions==4.9.0
urllib3==2.2.2
vocos==0.1.0
I was about to ask the same thing lol @anastasiuspernat. Should I run linux instead of windows on my system to get this working? I have a 3090ti and am pretty used to using MacOS for development - thinking of switching to linux as it might be a lot easier for running this and other similar projects...
@anastasiuspernat @tedjames Our hosted demos all use CUDA, of course, and run on Linux-based systems, but we're not aware of any reason why it should not work on Windows with CUDA as well. Please double-check your configurations, e.g. @anastasiuspernat I see you have CUDA 12.5 installed system-wide (nvidia-smi output), but a version of torch for CUDA 11.8 (torch==2.3.1+cu118). Typically, torch would install its own versions of CUDA to use (see @ubergarm's pip freeze output), to prevent conflicts with the system-wide version, but it appears yours did not.
@pieterscholtz
Thank you for clarification! I did a lot of different installations, nothing helped. I have other repos btw that use torch, CUDA, stable diffusion, facefusion, comfyui etc. - they do work. Only Mars doesn't. It does say that it's using CUDA btw:
device = "cuda"
mars5, config_class = torch.hub.load('Camb-ai/mars5-tts', 'mars5_english', device=device, trust_repo=True)
print(f"Mars5 device: {mars5.device}")
It outputs "cuda". And I think my pip freeze
is different because it's not Linux, I don't need those nvidia-cublas-cu12
etc right?
Are there any successful stories on Windows? Can you provide their pip freeze
? 🙏🏼
Scratch that! I reinstalled everything again, I think it worked:
audioread==3.0.1
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.13.1
fsspec==2024.2.0
huggingface-hub==0.23.4
idna==3.7
intel-openmp==2021.4.0
Jinja2==3.1.3
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mkl==2021.4.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tbb==2021.11.0
threadpoolctl==3.5.0
torch==2.3.1+cu121
torchaudio==2.3.1+cu121
torchvision==0.18.1+cu121
tqdm==4.66.4
typing_extensions==4.9.0
urllib3==2.2.2
vocos==0.1.0
It seems I needed that exact version of torch. Thanks for your help!
@pieterscholtz
~Thank you for clarification! I did a lot of different installations, nothing helped. I have other repos btw that use torch, CUDA, stable diffusion, facefusion, comfyui etc. - they do work. Only Mars doesn't. It does say that it's using CUDA btw:~
device = "cuda" mars5, config_class = torch.hub.load('Camb-ai/mars5-tts', 'mars5_english', device=device, trust_repo=True) print(f"Mars5 device: {mars5.device}")
~It outputs "cuda". And I think my
pip freeze
is different because it's not Linux, I don't need thosenvidia-cublas-cu12
etc right?~~Are there any successful stories on Windows? Can you provide their
pip freeze
? 🙏🏼~Scratch that! I reinstalled everything again, I think it worked:
audioread==3.0.1 certifi==2024.6.2 cffi==1.16.0 charset-normalizer==3.3.2 colorama==0.4.6 decorator==5.1.1 einops==0.8.0 encodec==0.1.1 filelock==3.13.1 fsspec==2024.2.0 huggingface-hub==0.23.4 idna==3.7 intel-openmp==2021.4.0 Jinja2==3.1.3 joblib==1.4.2 lazy_loader==0.4 librosa==0.10.2.post1 llvmlite==0.43.0 MarkupSafe==2.1.5 mkl==2021.4.0 mpmath==1.3.0 msgpack==1.0.8 networkx==3.2.1 numba==0.60.0 numpy==1.26.4 packaging==24.1 pillow==10.2.0 platformdirs==4.2.2 pooch==1.8.2 pycparser==2.22 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.14.0 soundfile==0.12.1 soxr==0.3.7 sympy==1.12 tbb==2021.11.0 threadpoolctl==3.5.0 torch==2.3.1+cu121 torchaudio==2.3.1+cu121 torchvision==0.18.1+cu121 tqdm==4.66.4 typing_extensions==4.9.0 urllib3==2.2.2 vocos==0.1.0
It seems I needed that exact version of torch. Thanks for your help!
How fast can this project reason on a 3090ti after successfully using cuda , bro
@hotdogarea Around 20 secs total on 4090
I get the following error
TypeError: _DecoratorContextManager.__call__() got an unexpected keyword argument 'cfg'
on this line
ar_codes, wav_out = mars5.tts("The quick brown rat.", wav, ref_transcript, cfg=cfg)
Installed with the requirements on Windows: