daswer123 / xtts-webui

Webui for using XTTS and for finetuning it
MIT License
667 stars 125 forks source link

Enhancement Suggestions #42

Open GalenMarek14 opened 9 months ago

GalenMarek14 commented 9 months ago

First, thank you for always working on this project, and sorry for bothering you again.

I wanted to suggest new enhancements.

The first one is for Whisper translation: can it do it with aligning? Automatically syncing the newly created translated audio to the original voice part in order to use this as auto auto-dubbing tool?

Secondly, "Add the ability to customize speakers when batch processing" is already on the to-do list. Would adding simple command prompts inside the default input text window (not batch process) be possible? Like giving speaker or advanced setting prompts before lines:

{Adam, temp:0.75} How are you?
{Daniel, temp:0.5} Fine.

So a kind of live-batch process without creating different text files. This would be a wonderful QoL upgrade. Yes, we can do it by manually splitting every paragraph into different text files but this would be much easier to add {speaker} before required parts or so...

It would also be great to have these: -Ability to add silences with prompts like: {0.5s}, -Ability to split output by prompts in input text window like {split} or so, -Postprocess audio edit page to merge batch parts with settings like silence generation.

Thank you so much for your great work!

daswer123 commented 9 months ago

Hi, yes I'm already working on the first point.

The second point is also interesting and something I already had in mind. At least adding different speakers is quite possible. On pauses I have some developments that need to be tested.

If there will be progress on one of these items, I will let you know ).