erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
944 stars 110 forks source link

Allow batch size 1 by default #241

Closed Dolyfin closed 3 months ago

Dolyfin commented 4 months ago

Is your feature request related to a problem? Please describe. Finetuning on RTX 3060 12GB with 20s length. Can only fit into VRAM with 1 Batch size. Using Grad 16 to counteract this and now fitting with 10.9GB VRAM usage.

Describe the solution you'd like Just edit the min value in finetune.py

Additional context I'm just lazy to go edit the values every install/update and would help more people with less VRAM to finetune. Add explanation in 'info - batch size' to turn up grad accumulation if batch size is low for optimal training.

erew123 commented 3 months ago

Hi @Dolyfin Apologies for my very late reply. I was so deep in getting v2 out that I didnt break away from dealing with code and issues to get it out.

Ive yet to test out this PR Ive been sent here https://github.com/erew123/alltalk_tts/pull/242 but that may resolve some of the issues with VRAM. Though Ill take on board your suggestion for an explanation. Im guessing its on Linux you are having the issue.

Ill make a note of this in the Feature requests so that I dont lose track of it

Thanks