DrewThomasson / ebook2audiobookXTTS

Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
MIT License
609 stars 69 forks source link

Update on getting styletts2 piper-tts and xtts to all work in one install #33

Open DrewThomasson opened 1 day ago

DrewThomasson commented 1 day ago

@ROBERT-MCDOWELL

I don't know if you'll find this helpful or not but Ive managed to get all Coqui tts, piper-tts and styletts2 all of them to work in one requirements file

Google Colab of it working in Testing_all_tts_services.ipynb.zip

Huggingface space showing them all working together https://huggingface.co/spaces/drewThomasson/testing-all-tts-models-together

ROBERT-MCDOWELL commented 1 day ago

@DrewThomasson More choices we have better it is indeed, since AI project are like startups era.... I'm still working on the refactoring, cleaning clode and optimizing. a lot of work! :)

DrewThomasson commented 1 day ago

What I've managed to find when trying to get ways of getting Calibre's ebook-convert function and ffmpeg built into the pip install

For Calibre

I was also looking at getting calibre to work with a pip install instead and found this

https://github.com/gutenbergtools/ebookconverter

but it doesn't work for windows :(

I think we might be able to find the binary ebook-convert exe in a Calibre install on windows to use that instead on windows Info on that ebook-convert for windows

FFmpeg

also these potential for ffmpeg as a static binary to include in it 👀 https://github.com/eugeneware/ffmpeg-static

ROBERT-MCDOWELL commented 1 day ago

I already explored what you just found and I found the only solution I'm working on, don't worry about Calibre and Ffmpeg, I found a way to not break the native use. FYI if you use chatGPT or other A.I. to help you to code be aware that sometimes copy/paste can generate a big mess at the end :o)

btw faster-whisper and whisperX are good engine for now.

DrewThomasson commented 1 day ago

oh dang! kk👍

lol yeah I honestly Never expected this to blow up so my code was a pretty rushed job using chatgpt to cut corners ngl

ROBERT-MCDOWELL commented 1 day ago

ooops btw faster-whisper and whisperX are more STT thant TTS :o\

DrewThomasson commented 1 day ago

Oh faster-whisper/whisperX the.... fine tuning xtts script?

DrewThomasson commented 1 day ago

lol yeah was confused when you mentioned it

ROBERT-MCDOWELL commented 1 day ago

with python expect to blow up at anytime with a little glitch in the matrix :D anyhow I'm happy about the tests a I'm doing but you'll be maybe shocked of the refactoring ;)

ROBERT-MCDOWELL commented 1 day ago

piper styleTTS2 are nice indeed, great community, active repo. we must maybe create a new option --model_engine... later.... ok I go back to work, see ya

DrewThomasson commented 1 day ago

kk👍

lol exactly already in my upcoming plans :) ----> https://github.com/DrewThomasson/ebook2audiobookXTTS/issues/32#issue-2582309136

ROBERT-MCDOWELL commented 1 day ago

bark is also a nice funny engine https://github.com/suno-ai/bark

DrewThomasson commented 1 day ago

Tru

It's suppose to be built into coqui tts But I run into issues trying to run it through their api?

I'll look further into it because I do quite like the model

What's unique to it is that not only does it clone the voice, but it also changes the speaking style,

So like you might have for instance

"Once upon a time"

And if your using a voice where the sample uses the words like- a lot it might come out like

"So like once upon like a time"

Very cool

https://docs.coqui.ai/en/latest/models/bark.html

DrewThomasson commented 1 day ago

I'll try it with the new updated repo??? 👀

https://github.com/idiap/coqui-ai-TTS

IDK HOW I JUST KNOW ABOUT THIS

FINALLY A FORK WHERE UPDATES ARE BEING APPLIED

DrewThomasson commented 1 day ago

I'll update you when I get a result from it lol

ROBERT-MCDOWELL commented 1 day ago

didn't know too!

DrewThomasson commented 1 day ago

Looks like it works on my end!

At the moment I've gotten the random speaker thing to work in this from the docs

text = "Hello, my name is Manmay , how are you?"

from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark

config = BarkConfig()
model = Bark.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True)

# with random speaker
output_dict = model.synthesize(text, config, speaker_id="random", voice_dirs=None)

# cloning a speaker.
# It assumes that you have a speaker file in `bark_voices/speaker_n/speaker.wav` or `bark_voices/speaker_n/speaker.npz`
output_dict = model.synthesize(text, config, speaker_id="ljspeech", voice_dirs="bark_voices/")

Here is the test output file

output.wav.zip

These tests were run on my m1 pro 16gb mac laptop in a python 3.10 env lol

DrewThomasson commented 1 day ago

lol I once got this working in a beta version of VoxNovel in Google Colab I know ill be able to get this working for this later

adding to list tho lol

DrewThomasson commented 1 day ago

ooo also gona add to the plans to add a way to use deepfilternet2 to denoise any reference input audio files

deepfilternet2

gradio space I made for demo using it lol

ROBERT-MCDOWELL commented 1 day ago

excellent Drew! denoiser is amazing too!