erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
816 stars 91 forks source link

Add UI control for the learning rate on the finetuning page. #172

Closed Artem-B closed 4 months ago

Artem-B commented 4 months ago

Is your feature request related to a problem? Please describe.

The default learning rate does not always work for everyone. E.g. on my sample set, with the default lr=5e-6, the loss goes down for the first 4-5 epochs, then starts climbing up, and never goes down after that. Reducing learning rate to 1-e6 results in a continuous decrease in the loss for at least 50 epochs (I didn't train longer than that). Being able to control learning rate is quite useful for finetuning, as it will likely be done on limited inputs with questionable quality, and will need manual tweaking.

Describe the solution you'd like Expose learning rate as a parameter on the finetuning page.

Describe alternatives you've considered The only other option is to manually edit finetune.py and edit the value. It's doable if one knows what they need to do, but it's not a good option for the users who don't know what's under the hood and just want to throw few .wav files on the page, and get the tuned model out.

erew123 commented 4 months ago

Hi @Artem-B

Thanks for the suggestion. Im currently attempting to rework various things over the whole interface along with other bits of code. Ill be happy to add this into that as/when I reach that area of the code.

Ill add it into the Feature Requests list

Thanks

erew123 commented 4 months ago

Hi @Artem-B

This should be in there now.

Thanks