coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.48k stars 4.33k forks source link

[Bug/Question] Using any built-in models sends my CPU & how can I tune performance? #1982

Closed f1yn closed 2 years ago

f1yn commented 2 years ago

Describe the bug

I wrote a small batch script that, will take a small paragraph of text, and parse through permutations of the different models to produce text, and it does this whenever more than one model is being executed (I've tried variations of 2, 3 4, and 5).

This isn't a typical use-case for sure, but I was under the impression after reading the docs that the compute-heavy part of this process was the creation of models, not the execution of existing ones. I eventually want to run TTS within a container on embedded hardware, but I am concerned about the overhead, as running a single paragraph through the parser is sending my computer and it's fans into the stratosphere. If it gets any hotter I might melt 🥵

image

I don't in full technically think this is a bug. I feel as through the software is technically doing what it's intending to do, but I do have some questions that will help me figure out how to tune the use cases and make embedding this easier.

To Reproduce

Using existing models, generate waveforms using tts but more than one at a time

Expected behavior

I expect my CPU not glow with the burn of a thousand suns, as I'm just executing existing models (and not creating new ones).

Logs

No response

Environment

/home/flynn/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3060"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.1+cu102",
        "TTS": "0.8.0",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#1 SMP Wed Mar 2 00:30:59 UTC 2022"
    }
}

Additional context

No response

f1yn commented 2 years ago

I've been spending some time today looking into the source code for synthesizing speech, and it's become apparent that models are in-fact using tensors when computing model results, and then when CUDA can't be enabled - it falls back onto CPU rendering.

And while I haven't identified where it does any pre-compute for the CPU-rendering pathway (yet), based the numbers alone it seems that a single execution will scale out to all available logical processors when computing.

A workaround for my use case would probably be to use containers that have limited CPU resources allocated to them. If anyone has the time though to answer some or all of my questions that would be appreciated.

erogol commented 2 years ago

This is not a bug. I push it to the discussions.