Closed danielw97 closed 6 months ago
Ah bummer :( ... I'll see if I can recreate it now, and will also look carefully through that section of the code, maybe I'll see what went wrong.
So far I am unable to reproduce this. I'm running this on a book with lots of short chapters, and am 10 in so far with no errors.
From the error you showed, it looks like whisper was asked to transcribe temp0.wav and the file was not there. Can you share the exact command you called epub2tts with so I can try to closely match what you're doing (i.e. language different than en, etc).
I might skip the transcription comparison if using VITS the more I think about it. That model is deterministic, and multiple runs will produce the exact same output so there's no point in wasting time making a transcript with whisper...
Hi, Not sure what was going on with this yesterday evening, although I'm struggling to reproduce this now which is quite annoying. Apologies for this, although you're right in saying that whisper won't make a difference with models such as vits and are only really required for xtts or similar which can have different outputs sometimes.
Glad you have not been able to reproduce this! I'll leave this open for a while just in case.
Closing now, do let me know if you are able to reproduce.
Linked this to a branch that has a fix. The issue is somehow an empty item ends up in the list of sentences to speak (maybe/probably because that was just punctuation, and a previous step strips out anything that is just punctuation with no letters or numbers).
Hi again, I'm not sure when this regression was introduced, however I'm now getting an error when using vits after processing several chapters. I'm currently getting this on windows, however can also test on linux if that is any use: Let me know if there's more info I can provide on this.
Error: Dimension out of range (expected to be in range of [-2, 1], but got 2) ... Retrying (0 retries left)
Something is wrong with the audio (86): temp0.wav
0%| | 0/16 [00:00<?, ?it/s] Traceback (most recent call last):
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 659, in
main()
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 648, in main
mybook.read_book(
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 444, in read_book
temp = AudioSegment.from_wav(tempwav)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\audio_segment.py", line 808, in from_wav
return cls.from_file(file, 'wav', parameters=parameters)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\audio_segment.py", line 651, in from_file
file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venv\Lib\site-packages\pydub\utils.py", line 60, in _fd_or_path_or_tempfile fd = open(fd, mode=mode)
^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'temp0.wav'