daswer123 / xtts-api-server

A simple FastAPI Server to run XTTSv2
MIT License
292 stars 67 forks source link

xtts-webui sounds different than xtts-api-server #36

Open AgentScrubbles opened 6 months ago

AgentScrubbles commented 6 months ago

(Sorry for the second issue, it was unrelated and I assume this is a stupid user moment)

So I have both xtts-api-server up and running using your docker container, all hooked up, running great.

To fine tune I set up the Colab of xtts-webui, and batch uploaded a bunch of wav files, and it sounds literally amazing. 1:1 it sounds perfect, I was honestly shocked at how accurate it was.

I thought copying the samples/<<name>>.wav into the api's samples/<<name>>.wav would be enough, but on the self-hosted API server it sounds like a completely different person. Maybe a hint that they are the same person, but a very large difference.

What is the proper way to "export" the fine-tuned model from the webui and add it to the API server? If it is just copying the wav file, is there something else I'm missing for my api server? Everything is generic, nothing customized.

Edit: Also the downloaded wav is just the first wav file, where I uploaded a batch of... 15 or so and had it clean them up and do all of the processing. So I assume really that's the problem - is there a "combined" wav or model that I should instead download?

Thanks for building the tools!

AgentScrubbles commented 6 months ago

Update, I Found that I Needed to download the entire directory, so speaker/...wav files all are downloaded after being cleaned up and moved over. However, playing on my API server it still sounds like a completely different voice. It's cleaner, less tinny and robotic than just the one, but it sounds nothing like the original voice still. (The voice I hear is american, the voice I uploaded is british)

daswer123 commented 6 months ago

Hi, I'll try to figure it out after the holidays, I don't have time right now.