aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
445 stars 44 forks source link

Dimension out of range error with most recent commit #133

Closed danielw97 closed 6 months ago

danielw97 commented 6 months ago

Hi again, I'm not sure when this regression was introduced, however I'm now getting an error when using vits after processing several chapters. I'm currently getting this on windows, however can also test on linux if that is any use: Let me know if there's more info I can provide on this.

Error: Dimension out of range (expected to be in range of [-2, 1], but got 2) ... Retrying (0 retries left)
Something is wrong with the audio (86): temp0.wav
0%| | 0/16 [00:00<?, ?it/s] Traceback (most recent call last):
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 659, in
main()
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 648, in main
mybook.read_book(
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 444, in read_book
temp = AudioSegment.from_wav(tempwav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\audio_segment.py", line 808, in from_wav
return cls.from_file(file, 'wav', parameters=parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\audio_segment.py", line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\utils.py", line 60, in _fd_or_path_or_tempfile fd = open(fd, mode=mode)
^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'temp0.wav'

aedocw commented 6 months ago

Ah bummer :( ... I'll see if I can recreate it now, and will also look carefully through that section of the code, maybe I'll see what went wrong.

aedocw commented 6 months ago

So far I am unable to reproduce this. I'm running this on a book with lots of short chapters, and am 10 in so far with no errors.

From the error you showed, it looks like whisper was asked to transcribe temp0.wav and the file was not there. Can you share the exact command you called epub2tts with so I can try to closely match what you're doing (i.e. language different than en, etc).

aedocw commented 6 months ago

I might skip the transcription comparison if using VITS the more I think about it. That model is deterministic, and multiple runs will produce the exact same output so there's no point in wasting time making a transcript with whisper...

danielw97 commented 6 months ago

Hi, Not sure what was going on with this yesterday evening, although I'm struggling to reproduce this now which is quite annoying. Apologies for this, although you're right in saying that whisper won't make a difference with models such as vits and are only really required for xtts or similar which can have different outputs sometimes.

aedocw commented 6 months ago

Glad you have not been able to reproduce this! I'll leave this open for a while just in case.

aedocw commented 6 months ago

Closing now, do let me know if you are able to reproduce.

aedocw commented 6 months ago

Linked this to a branch that has a fix. The issue is somehow an empty item ends up in the list of sentences to speak (maybe/probably because that was just punctuation, and a previous step strips out anything that is just punctuation with no letters or numbers).