[Bug/Question] Using any built-in models sends my CPU & how can I tune performance?

f1yn commented 2 years ago

Describe the bug

I wrote a small batch script that, will take a small paragraph of text, and parse through permutations of the different models to produce text, and it does this whenever more than one model is being executed (I've tried variations of 2, 3 4, and 5).

This isn't a typical use-case for sure, but I was under the impression after reading the docs that the compute-heavy part of this process was the creation of models, not the execution of existing ones. I eventually want to run TTS within a container on embedded hardware, but I am concerned about the overhead, as running a single paragraph through the parser is sending my computer and it's fans into the stratosphere. If it gets any hotter I might melt 🥵

I don't in full technically think this is a bug. I feel as through the software is technically doing what it's intending to do, but I do have some questions that will help me figure out how to tune the use cases and make embedding this easier.

Are certain models more computationally heavier than others? What makes them heavier? I don't need a highly technical answer to this, but if you could share a small blurb maybe I could locate tooling that could reduce these models to make them preferable for an embedded system.
When a paragraph is sent to the TTS as the text argument, I notice that it does some sort of delimiter splitting and based on what I'm seeing, trying to handle each of those sentences asynchronously. If this is true, can I tune this behavior, or use some configuration flag that will tweak models to execute sequentially instead?
Is this software thread-safe? I've experienced similar CPU flooring in the past, specifically with python-based software on Linux, where it's basically threads start in-fighting when more than one instance of a multi-threaded tech runs. I can't remember the exact place I read about this issue, but if this is a known limitation of generating text concurrently that would help me narrow it down.
I am using WSL 2 to run execute models. Is this a problem? Again, I was under the impression that computing the data into models was the heavy part, and that once the models are stable you just feed them inputs, which wouldn't require access to GPUs or AI acceleration HW. I could, and may be completely wrong about this_
Are there recommended ways to potentially execute these models using optimized platforms? I also am considering having multiple containers that will sequentially handle computing TTS, but this performance ceiling issue is making me a bit worried about that being possible.

To Reproduce

Using existing models, generate waveforms using tts but more than one at a time

Expected behavior

I expect my CPU not glow with the burn of a thousand suns, as I'm just executing existing models (and not creating new ones).

Logs

No response

Environment

/home/flynn/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3060"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.1+cu102",
        "TTS": "0.8.0",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#1 SMP Wed Mar 2 00:30:59 UTC 2022"
    }
}

Additional context

No response

f1yn commented 2 years ago

I've been spending some time today looking into the source code for synthesizing speech, and it's become apparent that models are in-fact using tensors when computing model results, and then when CUDA can't be enabled - it falls back onto CPU rendering.

And while I haven't identified where it does any pre-compute for the CPU-rendering pathway (yet), based the numbers alone it seems that a single execution will scale out to all available logical processors when computing.

A workaround for my use case would probably be to use containers that have limited CPU resources allocated to them. If anyone has the time though to answer some or all of my questions that would be appreciated.

erogol commented 2 years ago

This is not a bug. I push it to the discussions.

coqui-ai / TTS