Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI
https://www.camb.ai
GNU Affero General Public License v3.0
2.47k stars 200 forks source link

Does it run on CUDA? Getting cfg error #65

Closed anastasiuspernat closed 3 months ago

anastasiuspernat commented 3 months ago

I get the following error TypeError: _DecoratorContextManager.__call__() got an unexpected keyword argument 'cfg'

on this line

ar_codes, wav_out = mars5.tts("The quick brown rat.", wav, ref_transcript, cfg=cfg)

Installed with the requirements on Windows:

torch==2.0.1+cu117
torchvision==0.15.2+cu117
torchaudio==2.0.2+cu117
numpy==1.26.4
regex
librosa
vocos
encodec
safetensors
anastasiuspernat commented 3 months ago

I was able to fix it by installing more recent versions of torch+cuda. But the question remains - seems it doesn't use GPU at all, CPU load is 100% and GPU usage never goes beyond 1-2%. And the process takes forever - like 30 minutes (running on RTX 4090/Intel i5-11400). How can I really activate CUDA/GPU?

ubergarm commented 3 months ago

It uses 100% GPU on my 3090TI and just over 8 GB out of the 24GB VRAM available. Uses about 3% CPU.

It takes ~30 seconds to run the demo code with the example .wav file input used to generate The quick brown rat.

Are you using WSL ? What is the output of nvidia-smi or nvidia-smi.exe?

My setup:

$ uname -a
Linux archlinux 6.8.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 May 2024 17:49:46 +0000 x86_64 GNU/Linux

$ python -V
Python 3.12.3

$ pip freeze
audioread==3.0.1
black==24.4.2
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.15.4
fsspec==2024.6.0
hf_transfer==0.1.6
huggingface-hub==0.23.4
idna==3.7
isort==5.13.2
Jinja2==3.1.4
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mpmath==1.3.0
msgpack==1.0.8
mypy-extensions==1.0.0
networkx==3.3
numba==0.60.0
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
packaging==24.1
pathspec==0.12.1
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
setuptools==70.1.1
six==1.16.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12.1
threadpoolctl==3.5.0
torch==2.3.1
torchaudio==2.3.1
tqdm==4.66.4
typing_extensions==4.12.2
urllib3==2.2.2
vocos==0.1.0
wheel==0.43.0
tedjames commented 3 months ago

I'm also seeing pure CPU usage using the quick start guide... A quick start guide for CUDA inference would be excellent if someone could share or publish one as a PR 👍

anastasiuspernat commented 3 months ago

It uses 100% GPU on my 3090TI and just over 8 GB out of the 24GB VRAM available. Uses about 3% CPU.

It takes ~30 seconds to run the demo code with the example .wav file input used to generate The quick brown rat.

Are you using WSL ? What is the output of nvidia-smi or nvidia-smi.exe?

My setup:

Not WSL, using Terminal. Does it support Windows at all btw? The longer phrase took about 2 hrs and never finished - I had to cancel it.

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.99                 Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090      WDDM  |   00000000:01:00.0  On |                  Off |
|  0%   38C    P8             12W /  450W |    1132MiB /  24564MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

pip freeze:

audioread==3.0.1
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.13.1
fsspec==2024.2.0
huggingface-hub==0.23.4
idna==3.7
intel-openmp==2021.4.0
Jinja2==3.1.3
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mkl==2021.4.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tbb==2021.11.0
threadpoolctl==3.5.0
torch==2.3.1+cu118
torchaudio==2.3.1+cu118
torchvision==0.18.1+cu118
tqdm==4.66.4
typing_extensions==4.9.0
urllib3==2.2.2
vocos==0.1.0
tedjames commented 3 months ago

I was about to ask the same thing lol @anastasiuspernat. Should I run linux instead of windows on my system to get this working? I have a 3090ti and am pretty used to using MacOS for development - thinking of switching to linux as it might be a lot easier for running this and other similar projects...

pieterscholtz commented 3 months ago

@anastasiuspernat @tedjames Our hosted demos all use CUDA, of course, and run on Linux-based systems, but we're not aware of any reason why it should not work on Windows with CUDA as well. Please double-check your configurations, e.g. @anastasiuspernat I see you have CUDA 12.5 installed system-wide (nvidia-smi output), but a version of torch for CUDA 11.8 (torch==2.3.1+cu118). Typically, torch would install its own versions of CUDA to use (see @ubergarm's pip freeze output), to prevent conflicts with the system-wide version, but it appears yours did not.

anastasiuspernat commented 3 months ago

@pieterscholtz

Thank you for clarification! I did a lot of different installations, nothing helped. I have other repos btw that use torch, CUDA, stable diffusion, facefusion, comfyui etc. - they do work. Only Mars doesn't. It does say that it's using CUDA btw:

device = "cuda"
mars5, config_class = torch.hub.load('Camb-ai/mars5-tts', 'mars5_english', device=device, trust_repo=True)
print(f"Mars5 device: {mars5.device}")

It outputs "cuda". And I think my pip freeze is different because it's not Linux, I don't need those nvidia-cublas-cu12 etc right?

Are there any successful stories on Windows? Can you provide their pip freeze? 🙏🏼

Scratch that! I reinstalled everything again, I think it worked:

audioread==3.0.1
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.13.1
fsspec==2024.2.0
huggingface-hub==0.23.4
idna==3.7
intel-openmp==2021.4.0
Jinja2==3.1.3
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mkl==2021.4.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tbb==2021.11.0
threadpoolctl==3.5.0
torch==2.3.1+cu121
torchaudio==2.3.1+cu121
torchvision==0.18.1+cu121
tqdm==4.66.4
typing_extensions==4.9.0
urllib3==2.2.2
vocos==0.1.0

It seems I needed that exact version of torch. Thanks for your help!

hotdogarea commented 3 months ago

@pieterscholtz

~Thank you for clarification! I did a lot of different installations, nothing helped. I have other repos btw that use torch, CUDA, stable diffusion, facefusion, comfyui etc. - they do work. Only Mars doesn't. It does say that it's using CUDA btw:~

device = "cuda"
mars5, config_class = torch.hub.load('Camb-ai/mars5-tts', 'mars5_english', device=device, trust_repo=True)
print(f"Mars5 device: {mars5.device}")

~It outputs "cuda". And I think my pip freeze is different because it's not Linux, I don't need those nvidia-cublas-cu12 etc right?~

~Are there any successful stories on Windows? Can you provide their pip freeze? 🙏🏼~

Scratch that! I reinstalled everything again, I think it worked:

audioread==3.0.1
certifi==2024.6.2
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
einops==0.8.0
encodec==0.1.1
filelock==3.13.1
fsspec==2024.2.0
huggingface-hub==0.23.4
idna==3.7
intel-openmp==2021.4.0
Jinja2==3.1.3
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mkl==2021.4.0
mpmath==1.3.0
msgpack==1.0.8
networkx==3.2.1
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
scikit-learn==1.5.0
scipy==1.14.0
soundfile==0.12.1
soxr==0.3.7
sympy==1.12
tbb==2021.11.0
threadpoolctl==3.5.0
torch==2.3.1+cu121
torchaudio==2.3.1+cu121
torchvision==0.18.1+cu121
tqdm==4.66.4
typing_extensions==4.9.0
urllib3==2.2.2
vocos==0.1.0

It seems I needed that exact version of torch. Thanks for your help!

How fast can this project reason on a 3090ti after successfully using cuda , bro

anastasiuspernat commented 3 months ago

@hotdogarea Around 20 secs total on 4090