DrewThomasson / ebook2audiobookXTTS

Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
MIT License
646 stars 69 forks source link

Conversion runs fine on Docker for about 30 minutes and then stops, Gradio says "Error", with localhost saying "Connection refused" if checking connection #31

Open jdclark73 opened 1 week ago

jdclark73 commented 1 week ago

The terminal just stops, no errors or anything. You see the successful output plugging away, it produces multiple .wav files successfully if you check files in Docker, terminal keeps showing the fragments that are being made, and then you notice after a while that no new fragments are being produced. And eventually the terminal just goes back to the regular starting output ("PS C:\Users\username>" ). Then gradio says "Error" with no other info. And if you try to open localhost:7860 in another tab it says "Connection refused." I managed to convert a small test excerpt, with 5 minutes of audio. But if the process takes more than 30 minutes to run, then this connection issue seems to happen.

The wrinkle is that it works perfectly fine on another computer I tried, also Windows 11. Firewall is set to allow docker in both, and I also set the computer to not go to sleep. I did a fresh install of Docker to see if that helped, but still no luck.

Not sure how to capture network logs, but maybe that would give an idea. It might be a Docker configuration issue and nothing to do with ebook2audiobookXTTS, no idea. Any troubleshooting suggestions? Or are there other logs I can capture?

DrewThomasson commented 1 week ago

How much ram are you allowing docker to use?

jdclark73 commented 1 week ago

When it runs it uses about 8 gbs (PC has 16 total, and Docker seems to not allow it over 8). Intel i7, no GPU.

ROBERT-MCDOWELL commented 1 week ago

maybe you have installed some network components interrupting the flow? 30mn = 1800 sec, usually a default setting for opened connection marked as not responding. Depend the settings sometimes when a background process is done while the web page is active is considered as stalled.

DrewThomasson commented 1 week ago

You could also try running the docker in headless mode to make sure that it's purely the network thing that's the issue

jdclark73 commented 1 week ago

I've tried running it in headless mode, but seem to be going wrong somewhere. In Windows I did the following, as listed in the instructions, and then added my input files there: mkdir input-folder && mkdir Audiobooks

Then I take the command from the instructions and do the following (sorry, the line breaks seem to be getting scrambled on Github):

docker run -it --rm --platform=linux/amd64 <br> -v $(pwd)/input-folder:/home/user/app/input_folder \ -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py --headless True --ebook /home/user/app/input_folder/ch4.docx --voice "/home/user/app/input_folder/bk_adbl_023722_sample.mp3"

And I get the following error (both when I put the equal sign between platform and linux and also if I don't. It stops reading at --p):

ParserError: Line | 4 | --platform linux/amd64 \ | ~ | Missing expression after unary operator '--'.

So then I tried separating it, but I'm guessing this just runs two containers?

docker run -it --rm --platform=linux/amd64 -v $(pwd)/input-folder:/home/user/app/input_folder \ -v $(pwd)/Audiobooks:/home/user/app/Audiobooks

docker run -it --rm --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface \ python app.py --headless True --ebook /home/user/app/input_folder/ch4.docx --voice "/home/user/app/input_folder/bk_adbl_023722_sample.mp3"

ROBERT-MCDOWELL commented 1 week ago

no equal needed --platform linux/amd64

jdclark73 commented 1 week ago

I tried with and without equal but had the same result. Powershell stops reading after --p I guess anyway.

For trying to edit the network window, whould I do that in Docker? Or in Windows somewhere? I don't have any special network software per se, just work VPN.

jdclark73 commented 1 week ago

Here is the code I just tried and got the same error:

docker run -it --rm \ -v $(pwd)/input-folder:/home/user/app/input_folder \ -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \ --platform linux/amd64 \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py --headless True --ebook /home/user/app/input_folder/ch4.docx --voice /home/user/app/input_folder/bk_adbl_023722_sample.mp3

DrewThomasson commented 1 week ago

Oh, backslashes "\"don't work in powershell

Try it in one line like this

docker run -it --rm --platform linux/amd64 -v ${pwd}/input-folder:/home/user/app/input_folder -v ${pwd}/Audiobooks:/home/user/app/Audiobooks athomasson2/ebook2audiobookxtts:huggingface python app.py --headless True --ebook /home/user/app/input_folder/ch4.docx --voice /home/user/app/input_folder/bk_adbl_023722_sample.mp3
jdclark73 commented 1 week ago

I'll test that command later. In the meantime I've found a workaround. I've gotten it to run from my computer without Docker. I was having dependency issues, which were mostly solved by installing MS C++ Build Tools, especially the "Desktop development with C++ module," including the Windows 11 SDK (https://visualstudio.microsoft.com/visual-cpp-build-tools). I think tts also wasn't working with Python 3.12 and so I had to install 3.9. I'd need to double check that though.

Once I got all the other dependencies installed, such as ffmpeg, calibre, etc., then I was able to run the app in headless mode. It crashed a few times, sometimes after multiple hours, BUT the already produced audio files remained on the machine instead of being wiped, like what happens when the Docker image stops. So I then went on my local machine to ...\ebook2audiobookXTTS-main\Working_files\temp and ...\ebook2audiobookXTTS-main\Chapter_wav_files and copied the files. Then I looked through the app and took out the wav file combining section and adapted it to take inputs (file path of folder to process, file name, etc) and merged the long lists of wav files into a single file.

I then went back to the input document/epub and deleted the already converted part, and then ran ebook2audiobookXTTS again, basically picking up where the last pass ended. One 22,000 word file took about 3-4 rounds of this. To simplify things I had already taken the original epub and divided it into chapters and saved as .docx.

For my own purposes, after conversion, I opened the chapter .wav file in Audacity, sped it up to 140%, and then converted to .mp3, making a 500 mb file about 60 mb. And then I loaded it into my podcast app.

For reference, in the terminal where I was running ebook2audiobookXTTS, I got a few different errors. In some cases the readout just stopped so I have no idea what happened, but in a few other cases I got a runtime error and a bad memory allocation error. So I'm assuming it asked for more ram than it got, or something like that.

I've attached the .wav file combiner I modified to ask for a folder to process. I've saved it with a .txt extension but it would need a .py extension obviously (Github won't allow that as an attachment). wav_file_combiner.txt

One other weird thing I found was that even though I had uploaded one kind of voice (an American woman), every now and then a fragment was made with that voice quality, but in a British accent (which the default is, I guess). It's like TTS forgot to speak with an American accent. It didn't fully use the default voice though, which I'd say is an older British woman. It just took the voice quality of the voice I used (younger American woman), and spoke in a British accent. This was only every now and then, and just for a sentence or so.

*Edited for small typos.

ROBERT-MCDOWELL commented 1 week ago

@jdclark73 wait the next git update, I'm working on a very simple way to install the non docker version.