erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
944 stars 110 forks source link

Fine Tuning Enhancements #242

Closed bmalaski closed 3 months ago

bmalaski commented 4 months ago

Hello,

I adding some tweaks to how the training works. I know you are in the middle of 2.0 so I will most likely need to fix these when that is released. Anyway, for your consideration:

Take a look and let me know what you think.

eta

ui

image

erew123 commented 4 months ago

Hey @bmalaski

This looks really awesome! Apologies for my very slow reply to you, but I have been cramming hard to get the BETA out. Which it now is AllTalk v2 BETA Download Details & Discussion

I will take a proper look at what you've done and Ill be happy to import it into the V1 build of Finetuning.

Good news/Bad news section. I have updated the V2 build of Finetuning, however its mostly visual, a couple of improvements with file locations/handling etc. So its probably about 90-95% the same code base. This means its more than likely a reasonably easy copy/paste to get this new code over from what you have done here. I'm happy to do it, or if you're very keen and wish to try the BETA, you're welcome to. I'm hoping to take a day or so off from coding and then I will get back and look through/test/import your PR properly!

Thanks so much though. Its great when other people help out and do something like you've done!

I will get back to you shortly!

bmalaski commented 4 months ago

Hey man, yea take some time to relax. I will look at v2 beta and look at creating a new PR for that branch

erew123 commented 3 months ago

Hi @bmalaski Ive just managed to sit and properly take a look at this. Its great, really great!

The only thing I did note is that if you set the epochs lower than 2, it never finishes training, which may come back to "Estimated Completion Time" never being able to complete. It does actually train a model, just never gets to saying it finished that step. Not much of an issue though, just something I noted.

I didnt see any complaints about imports. I know matlab but Ive not used it in Python. Im guessing there are no extra imports other than "pip install trainer" required?

Finetune on v2

So if you are willing to give the v2 of finetuning a go, I probably should mention the changes Ive made to that code:

1) Its now running Gradio 4.xx rather than 3.52, which does help with things that can be done in the interface. 2) Tided up the Gradio interface generally, which you may find beneficial as it leaves plenty of space on screen for checkboxes etc. And I have moved most of the explainers/guides onto separate tabs. 3) As the models folder shifted down 1x level in the models folder, I changed the whole process for finding/selecting models as well as the compaction/move model to folder at the end of training. 4) Finally, I have done a thing with the step 1 whisper model to force a maximum length of wav files. The reason for this is whisper was sometimes pretty bad at splitting up audio and you can get 2+ minute long wav files or when you have short input audio it sometimes just didn't split them down. 5) Moved to 0.24.1 of the Coqui TTS engine/scripts........ Err yes, the last one Coqui wrote/published was 0.22.0..... But I found someone whom was working on updating the engine/requirements and so I've decided to move up to that and also send in a few PR's there.

I don't think any code you've made here will impact/interfere with anything I've changed or vice versa, so I'm guessing it shouldn't be too much more than a copy/paste job. If you get time to give it a shot..That would be awesome!.

Let me know on the imports requirements for the version you have done and Ill get it imported :)

Thanks so much!