coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.13k stars 4.28k forks source link

[Bug] Memory Explosion with xtts HifiganGenerator #3463

Closed chaseaucoin closed 10 months ago

chaseaucoin commented 10 months ago

Describe the bug

When running xttsv2 on 3090 RTX on WSL2 Ubuntu 22.04 on Windows 11 I would intermittently get memory explosions when doing inference. It seems to happen when I have huggin face transformer LLM loaded at the same time as XTTS. I traced when it happens to the forward pass of HifiganGenerator when it runs o = self.conv_pre(x) because self.conv_pre is just weight_norm(Conv1d(in_channels, upsample_initial_channel, 7, 1, padding=3) I couldn't identify any further what was going on but for some reason calling this uses all avilable gpu memory. Prior to hitting this line the system is using 8GB of VRAM then as soon as it hits it it goes to 23.7+GB of VRAM then the system starts to freeze.

Any help would be awesome but it is a weird bug.

To Reproduce

I'm not able to produce on any of the leased machines I have. This just happens on my 3090 RTX, but the steps seem to be on

Load XTTS Model Load Hugging Face LLM

Run inference via inference_stream

Expected behavior

Memory pressure may fluctuate a bit but not 16+GB worth of fluxuation

Logs

No response

Environment

Windows 11
WSL2 Ubuntu 22.04
Tried on multiple version of python and pytorch and multiple versions of cuda

Reproduced on 11.8 12.2 releases of pytorch

Additional context

No response

chaseaucoin commented 10 months ago

Okay well, shoot I actually found the issue.

This appears to be a known issue with inference of Conv1D via GPU

https://github.com/pytorch/pytorch/issues/98688

I was able to address the issue with os.environ["TORCH_CUDNN_V8_API_DISABLED"] = "1"

I'll leave this bug in case someone runs into the same issue, but I'm going to go ahead and close it.