erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
927 stars 106 forks source link

TTS Generator leads BSOD and system crashes #355

Closed Carlcmd closed 2 hours ago

Carlcmd commented 2 hours ago

diagnostics.log If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information): I tried to convert a larg text first with alltak and then with the beta (Screenshot provided) altalktextsample updated: [approx. date] Custom Python environment: no Text-generation-webUI was updated: [unkown,maybe never]

Additional context what would be the best settings for chunk size and export split for very low Vram and can or should i use Deepseed and low vram setting together? my vram is only 6GB i think. Should it still work or can i only produce output from smaller texts?

erew123 commented 2 hours ago

I suspect you are running low on system RAM is probably related to your system crash, though it could be other issues.

If your system is having a BSOD, then that will 99% be related to how your system drivers are handling things, such as memory use etc. Very long generations in the TTS Gen will use quite a lot of RAM, keeping hold of the list of all the generated audio and will also vary based on your browser.

My suggestion would be:

1) Update your Nvidia driver as you are a few revisions behind https://www.nvidia.com/en-us/geforce/drivers/ 2) Generally I would check your Windows updates are up to date and also whatever browser you are using. 3) Spit your audio generation down into smaller chunks e.g. if as above you are 21,000 words, creating 948 chunks, do 2x batches of 10,500 words and therefore 420 ish chunks per session.

You can re-join the split audio with something like audacity when its completed.

As long as you have nothing else major occurring that is graphical/using up your GPU VRAM (like browsing very heavy graphical web-pages OR playing a game) then I would probably just use DeepSpeed, as that avoids shifting things around on your system and may help in your scenario.

If you continue to experience BSOD issues, then I would look for the error code for the crash in your Windows Event log and/or the memory dump to identify which driver or system issue caused the crash, which can then help identify if there is a driver issue, hardware issue etc.

Carlcmd commented 54 minutes ago

Just so I understand you correctly, should I input less text or just change chunk size or both? And you suggest trying deepseed without low vram setting,, correct? Thank you for your help.

erew123 commented 33 minutes ago

Just so I understand you correctly, should I input less text or just change chunk size or both?

A smaller block of text/less text. The web browser has to keep/store a list of each individual TTS line/chunk generated. This can get heavy on System Ram use and some browsers have limitations. As your system only has 16GB of System RAM, this could cause you a memory limit issue on very large blocks of text. So breaking a large block of text down into multiple smaller blocks of text/generations will help alleviate that situation.

And you suggest trying deepseed without low vram setting,, correct?

Correct. Low VRAM is beneficial when you have something else loaded into your VRAM, like a Large Language Model, but if you have nothing else loaded and VRAM spare, it will not be beneficial.