KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
MIT License
342 stars 38 forks source link

Gradio All-in-One: Preprocess, Train, TensorBoard, and Interface #23

Open lpscr opened 1 week ago

lpscr commented 1 week ago

Hi, @KdaiP

I’m not sure where to post this, so I’ll share it here.

After a lot of testing, I’m working on a quick Gradio all-in-one solution . It includes preprocessing, training, TensorBoard , and the interface.

I’m still working on it and haven’t finished yet. I need to fix some issues, but I hope you’ll like it once it’s ready. I’ll send it for testing as soon as it’s done. Since I’m new to GitHub, i dont know how share code for your repo ?

Here’s a quick preview of what it looks like so far. Please enable the voice feature, as it’s currently muted, to experience the interface.

Let me know what you think!

https://github.com/user-attachments/assets/fd3cf287-4952-4e6f-906b-d84616dbc272

KdaiP commented 1 week ago

Hi, This sounds really cool! Once you've finished your modifications, you can create a pull request to share the code.

Also, it seems like the issue of audio cutouts has improved in this new model.

Maybe in the future, we can add components for data annotation and data cleaning (which I'm currently organizing), similar to the one-click packages for TTS and GPT-SoVITS, to create an out-of-the-box workflow.

Looking forward to seeing your progress!

juntaosun commented 1 week ago

@lpscr You can create a project branch on GitHub, upload your updated code, and then push it to the main branch to synchronize the code.

lpscr commented 1 week ago

Thank you all!

I've finished a new version, and I hope you like it. I plan to upload it tomorrow and make a pull request as you suggested.

I have a quick question: Can I create a fork, add the files, and then submit a pull request? I'm still new to this process, so I just want to make sure.

In this version, I added:

The ability to start, stop, and resume training, with a save and config option so you don't lose progress. The ability to start and stop TensorBoard. A random sample selector for training data to make it easier to use, as well as a reference to compare results. A seed option, allowing you to use a random seed or fix it if needed. Automatic model downloads on the first run if the models aren't found locally.

I have a question about the sample in the progress tab: Do we need to resample the files?

Let me know what you think of the new version. I ran some tests, and it's working great! You can easily fine-tune voices, and everything is well-organized. I didn't touch the core code, so there shouldn't be any conflicts. I just created a separate Python file and imported code from your repo, or copied some parts. I also added comments, so it's easy to understand and follow what I did.

https://github.com/user-attachments/assets/16800e16-3e95-42b2-a6c3-51aa31fd0c95

KdaiP commented 1 week ago

Hi, thank you for your hard work on this! Your implementation is both elegant and well-organized!

In addition, resample is not required because the training target of StableTTS (mel spectrogram) is already extracted from resampled audio.