aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
571 stars 49 forks source link

Crash if sentence is too long #31

Closed aedocw closed 1 year ago

aedocw commented 1 year ago

Some books seem to cause the sentence segmenter TTS uses (https://pypi.org/project/pysbd/) to fail to actually detect the end of the sentence. I can't tell what is causing this, but then the resulting sentence is very long as it's made up of multiple sentences strung together. That in turn causes epub2tts to get killed due to exceeding available memory.

Because this is external to epub2tts (and even external to coqui TTS) I don't think there's much I can do about it. Logging the bug here though in case anyone else runs into this. Best way to tell is to look at the output where it says "> Text splitted to sentences." followed by the list containing the sentences to be read. If you look through that you'll probably find one sentence that is enormously long. As of right now that just means this is a book that can't be turned into an m4b.

aedocw commented 1 year ago

FIX - convert book to TEXT. Maybe not the most elegant, but you can use Calibre to easily convert an epub to txt, and at least for the one book that was reliably causing this problem, that fixed it.

danielw97 commented 11 months ago

Hi there, Here's hoping this doesn't reopen the issue, although just to say that I'm consistently having this problem with epubs for some reason. This is an exceptional project and is something I've been looking for for ages and full marks to you for the work you've done, however is there anything I might be able to do to track down this bug and perhaps solve it? Converting to text is no problem, however some of the books I'm trying to run through are quite long and having chapter markers is a big plus. Thanks in advance, and no problem if this is something that's not easily fixable.

aedocw commented 11 months ago

I haven't had an issue in quite a while since I updated the code to strip more punctuation from text before attempting to read it. Since doing that, it seems the segmenter has been able to properly break up sentences as needed.

If you can let me know the name of a book that has reliably been causing the problem for you (and especially which chapter), I'll be happy to play with it and see if I can get to the bottom of it.

Really happy you're enjoying using this!