why xtts v2 inference time used RAM double(or more 3x) then GPU or VRAM

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Mozilla Public License 2.0

35.66k stars 4.37k forks source link

xtts_issue

Describe the bug

For the example when model loading the RAM required close to 5 GB and VRAM use 2.1 GB. How can i reduce RAM uses for loading the model infernce fime. basically i try to figure out which is the issue for taking more RAM. Here i found when i initialize the GPT block then this model used closed to 5 GB RAM. this RAM is not GPU memory.

To Reproduce

Inference used RAM : 4634.7890625

Expected behavior

Expected low RAM use when inference

Logs

No response

Environment

- python==3.10
- torch                     2.2.1+cu121              pypi_0    pypi
- torchaudio                2.2.1+cu121              pypi_0    pypi
- deepspeed                 0.10.3                   pypi_0    pypi

Additional context

No response

### Tasks

coqui-ai / TTS