aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
445 stars 44 forks source link

failing on long texts, sample tested was 430000 words #87

Closed danielw97 closed 7 months ago

danielw97 commented 7 months ago

Hi again, I'm reporting what I believe to be an edge case, although epub2tts seems to fail when writing longer texts, I tested a sample that was 437000 words. The wav files are written fine, and silences also seem to be removed. However, when ffmpeg is called it fails with the following error: I'm also left with an m4a file of 0 bytes. Low priority on this one of course, although I wanted to flag it up so you know. If there's any more detail that might help let me know.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Scripts\epub2tts.exe__main.py", line 7, in
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 382, in main
mybook.read_book(voice_samples=args.xtts, engine=args.engine, openai=args.openai, model_name=args.model, speaker=args.speaker, bitrate=args.bitrate)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 334, in read_book
concatenated.export(outputm4a, format="ipod", bitrate=bitrate)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\pydub\audio_segment.py", line 895, in export
wave_data.writeframesraw(pcm_for_wav)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 547, in writeframesraw
self._ensure_header_written(len(data))
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range
Exception ignored in: <function Wave_write.
del at 0x00000137269E8180>
Traceback (most recent call last):
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 447, in
del__
self.close()
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 565, in close
self._ensure_header_written(0)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range

danielw97 commented 7 months ago

Upon looking into this a bit further, I believe it has to do with the fact that the max filesize for wav files is 4 GB, so not sure how easily fixable this is.

aedocw commented 7 months ago

Interesting, this is probably because I dropped one bit of functionality when doing a bunch of refactoring recently. The function that pulls the text out of the file used to break it up every 50000 characters. That would mean the long text would end up as 9 chapters/parts.

I brought back the chunking in the "better-txt-handling" branch, if you can give this a try and let me know if it solves the problem I would appreciate it.

danielw97 commented 7 months ago

Thanks for the fast reply as always. That unfortunately didn't work, and I'll also add that the files are already split up although the splitting for .txt files might help in other cases and not this one specifically as I'm converting from epub. It fails at the last step when the audio is going into ffmpeg I believe, after silence etc has been removed. Based on my research this is probably due to the fact that wav has a max filesize of 4 GB, and when adding all of these files together it is 4.8 GB. Not sure how much this helps, and let me know if there's any debugging info etc I can provide. Once again appreciate your fast response on a community project.

aedocw commented 7 months ago

You're welcome, I'm happy anyone else is using this, haha!

OK that's helpful information regarding the size. I'll see if I can poke at it some over the next few days. I'm sure there's a way to handle this better, probably by moving it to a compressed format before that step. Looks like I get to learn some more about pydub!

danielw97 commented 7 months ago

Okay great. An alternative to wav (and I"m not sure how good the python tooling is) is flac (free lossless audio codec). The main reason I mention it is not only is it lossless similar to wav although it is compressed without losing quality, but I don't believe it has the 4 GB filesize limitation that wav does. Hth a bit.

danielw97 commented 7 months ago

Hi again, I've also done some experimentation with long text files, and think that murging the improved-txt-handling branch might be useful for folks. An example I'm looking at currently is going to take up 2944 tmp wav files, whereas if they were segmented as previous there would be parts along the way I don't think this will fix the issue with epubs as noted above, however just a thought.

aedocw commented 7 months ago

Good idea, appreciate it, I've merged that branch. Will learn more about using pydub better, and probably use compressed files when possible rather than wav files, which should sort this out.

danielw97 commented 7 months ago

Great, thanks again for your continued work on this.

aedocw commented 7 months ago

In the branch mp3-temps I'm using all mp3s for intermediary files and it is working great. I'll probably merge it later today, but if you want to try it out please do. It should solve the gigantic wave file problem, since wave files are only used now for the smallest chunks before being converted to mp3s.

aedocw commented 7 months ago

Disregard, this is still WIP. converting all the wav files to mp3 is introducing a weird peaking on S sounds (not sure how better to describe it). I'll experiment with other options for reducing the file size...

aedocw commented 7 months ago

I should have just followed your suggestion in the first place, haha! FLAC files come out about half the size, sometimes less, and sound perfect, everything else introduced artifacts that I noticed. I'm going to merge, but when you get a chance please test - if the issue persists please re-open this bug.

aedocw commented 7 months ago

BTW I took a long book I had, converted it to text, and then repeated some of it to get to a book with 439126 words. I'm running it right now (not with XTTS since that would take me ages). Will update this with the end result.

danielw97 commented 7 months ago

Okay great, I should be able to look at this tomorrow although will cue something to run this evening and will report back. As always, thanks for your work.

danielw97 commented 7 months ago

I've tested the sample that was giving me issues before, and it looks as though this has fixed it.

danielw97 commented 7 months ago

Hi, Just to say that the error appears to be happening again, not sure why. I falsely assumed that the use of flac would fix it, and it seemed to in my testing however am now getting the following error again: It looks as though pydub is converting the flac to wav at least when looking at the error, I wonder if there's any way to combine the flacs with ffmpeg or a similar utility, assuming all other processing is done prior to that? It should be able to, specifying the -i flag multiple times. Not a big rush with this one as I can break particularly large books in half as the initial conversion to flac is already done, just wanted to let you know.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "C:\Users\daniel\Documents\epub2tts.venv\Scripts\epub2tts.exe__main.py", line 7, in
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\epub2tts.py", line 418, in main
mybook.read_book(voice_samples=args.xtts, engine=args.engine, openai=args.openai, model_name=args.model, speaker=args.speaker, bitrate=args.bitrate)
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\epub2tts.py", line 370, in read_book
concatenated.export(outputm4a, format="ipod", bitrate=bitrate)
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\audio_segment.py", line 895, in export
wave_data.writeframesraw(pcm_for_wav)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 547, in writeframesraw
self._ensure_header_written(len(data))
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range
Exception ignored in: <function Wave_write.
del at 0x000001D60B1A4900>
Traceback (most recent call last):
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 447, in
del__
self.close()
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 565, in close
self._ensure_header_written(0)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range

danielw97 commented 7 months ago

Update: I've done a bit more testing on this, if even for my own curiosity. I ended up testing a larger epub file on a dedicated linux machine, and got a slightly more verbose error. It definitely looks as though the final output is getting converted to wav for some reason before being passed to ffmpeg, not sure why or if this is a feature of one of the utilities being used. It's running into the max filesize problem being 4 GB for wav, and is failing at that point by the looks of it. Not sure if this is at all helpful, but please see the error below.

Elapsed: 0 minutes, ETA: 0 minutes
Traceback (most recent call last):
File "/home/daniel/epub2tts/.venv/bin/epub2tts", line 33, in
sys.exit(load_entry_point('epub2tts==2.1.9', 'console_scripts', 'epub2tts')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/daniel/epub2tts/.venv/lib/python3.11/site-packages/epub2tts.py", line 598, in main
mybook.read_book(
File "/home/daniel/epub2tts/.venv/lib/python3.11/site-packages/epub2tts.py", line 457, in read_book
concatenated.export(outputm4a, format="ipod", bitrate=bitrate)
File "/home/daniel/epub2tts/.venv/lib/python3.11/site-packages/pydub/audio_segment.py", line 895, in export
wave_data.writeframesraw(pcm_for_wav)
File "/usr/lib/python3.11/wave.py", line 547, in writeframesraw
self._ensure_header_written(len(data))
File "/usr/lib/python3.11/wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "/usr/lib/python3.11/wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'L' format requires 0 <= number <= 4294967295
Exception ignored in: <function Wave_write.del at 0x7f42148eeac0>
Traceback (most recent call last):
File "/usr/lib/python3.11/wave.py", line 447, in del
self.close()
File "/usr/lib/python3.11/wave.py", line 565, in close
self._ensure_header_written(0)
File "/usr/lib/python3.11/wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "/usr/lib/python3.11/wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'L' format requires 0 <= number <= 4294967295