Closed chaseaucoin closed 10 months ago
Okay well, shoot I actually found the issue.
This appears to be a known issue with inference of Conv1D via GPU
https://github.com/pytorch/pytorch/issues/98688
I was able to address the issue with os.environ["TORCH_CUDNN_V8_API_DISABLED"] = "1"
I'll leave this bug in case someone runs into the same issue, but I'm going to go ahead and close it.
Describe the bug
When running xttsv2 on 3090 RTX on WSL2 Ubuntu 22.04 on Windows 11 I would intermittently get memory explosions when doing inference. It seems to happen when I have huggin face transformer LLM loaded at the same time as XTTS. I traced when it happens to the forward pass of HifiganGenerator when it runs o = self.conv_pre(x) because self.conv_pre is just weight_norm(Conv1d(in_channels, upsample_initial_channel, 7, 1, padding=3) I couldn't identify any further what was going on but for some reason calling this uses all avilable gpu memory. Prior to hitting this line the system is using 8GB of VRAM then as soon as it hits it it goes to 23.7+GB of VRAM then the system starts to freeze.
Any help would be awesome but it is a weird bug.
To Reproduce
I'm not able to produce on any of the leased machines I have. This just happens on my 3090 RTX, but the steps seem to be on
Load XTTS Model Load Hugging Face LLM
Run inference via inference_stream
Expected behavior
Memory pressure may fluctuate a bit but not 16+GB worth of fluxuation
Logs
No response
Environment
Additional context
No response