possibility to run this tool locally?

duplaja / epub-to-audiobook-hf

Epub to MB4 Audiobook, with StyleTTS2 via HuggingFace Spaces API

Apache License 2.0

30 stars 2 forks source link

possibility to run this tool locally? #2

Open danielw97 opened 10 months ago

danielw97 commented 10 months ago

Hi there, Thanks very much for your work on this project, it seems quite interesting. I'm just wondering if there is any way/possibility to run this locally for those of us who have good GPUs? It could possibly leverage the styletts2 rest api or importable script at https://github.com/NeuralVox/StyleTTS2 As someone who isn't a developer this is probably a lot more complicated than I'm making it out to be, although this is great work so far none the less.

duplaja commented 10 months ago

Hey there, you are welcome! It should be very doable to modify to run locally. I actually don't have the hardware to test or build it out myself, unfortunately (what lead to this).

One would essentially need to replace the code in the convert_chapter function ( https://github.com/duplaja/epub-to-audiobook-hf/blob/a67d799ccca8f2b147499ec153778892dc366e8d/epub-to-audiobook-hf.py#L87 ), to either call a local API in that version of StyleTTS 2, or pass the chapter text in whatever method was desired, and get the result wav files. (would also want to strip out the code to spin up / down the HF Space.

You can look at some of what is being done on the StyleTTS 2 side, by looking at app.py in the HF Space's code: https://huggingface.co/spaces/Dupaja/styletts2-public/blob/main/app.py

I'd certainly give my blessing if anyone wanted to try and make a local fork!

nixolas1 commented 10 months ago

I did a quick and dirty local fork, but it works! Check it out: https://github.com/nixolas1/epub-to-audiobook-local

danielw97 commented 10 months ago

Thanks much for your work on this. Unfortunately when I just tried to test this on wsl, it appeared to try to process the book although didn't actually process any speech. I'm running the docker container at 5000, and ffmpeg as well as the requirements are installed.

nixolas1 commented 10 months ago

Strange. No errors in the script, or in the docker logs? (I usually check the logs on the container in Docker Desktop). I'll try doing a fresh install later, maybe I forgot some local changes.

I remember having to fix some crashes in the docker image (turning on normalization on audio or something). But don't have the time to set this up properly yet. A more ideal setup would be having just one image, with the audiobook logic alongside the styleTTS, and not use such a obscure tts image.