Closed iiiii2000 closed 6 months ago
It's possible your GPU does not have enough ram for the model. I'll add this output to the script soon, but in the mean time can you get into python and run:
import torch
torch.cuda.get_device_properties(0).total_memory
Let me know how much memory it shows. I'll also add an override to the memory limit so folks can experiment with that, as if you have a GPU it should be safe to use it regardless of how much video ram you have (i.e. worst case is it uses virtual ram and gets slow).
If you can checkout the "ignore-vram" branch, it will print out available VRAM and what device it's using (CPU vs GPU) during a normal run.
I'm really not sure what happened with your second attempt there, that should have worked, I don't think I've ever seen "failed to load audio" there (I think that's from when whisper is trying to make a transcript of the audio chunk) :(
I have clone the ignore-vram branch, and the same thing happened even when I tried with a really short txt file (71 characters)
Namespace(sourcefile='C:\Users\PC\Downloads\Chapter Three.txt', engine='tts', xtts='C:\Users\PC\Downloads\best.wav', openai=None, model='mymodel', speaker='p335', scan=False, start=1, end=999, language='en', minratio=88, skiplinks=False, skipfootnotes=False, bitrate='69k', debug=True) Language selected: en Saving to Chapter Three-best.m4b Total characters: 71 Not enough VRAM on GPU. Using CPU Loading model: C:\Users\PC\AppData\Local\tts/mymodel tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Computing speaker latents... Reading from 1 to 1 0%| | 0/1 [00:00<?, ?it/s]I grit my teeth and force my steps down the hall toward the front door. Time to first chunck: 7.959530830383301 Received chunk 0 of audio length 65792 Time to first chunck: 13.617579698562622 Received chunk 0 of audio length 42240 Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/1 [00:13<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 659, in
main() File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 648, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 439, in read_book
- str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value
Also happened when using the default xtts model:
Namespace(sourcefile='C:\Users\PC\Downloads\Chapter Three.txt', engine='xtts', xtts=None, openai=None, model='tts_models/en/vctk/vits', speaker='Damien Black', scan=False, start=1, end=999, language='en', minratio=88, skiplinks=False, skipfootnotes=False, bitrate='69k', debug=True) Language selected: en Saving to Chapter Three-Damien Black.m4b Total characters: 71 Device: cpu Engine is TTS, model is tts_models/multilingual/multi-dataset/xtts_v2 tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Reading from 1 to 1 0%| | 0/1 [00:00<?, ?it/s]text to read: I grit my teeth and force my steps down the hall toward the front door. Text splitted to sentences. ['I grit my teeth and force my steps down the hall toward the front door.'] Processing time: 15.116531610488892 Real-time factor: 2.7235547294685585 Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/1 [00:15<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 659, in
main() File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 648, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 439, in read_book
- str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value
When I tried to use tts manually (using command-line tts command) with my own model, it worked without issues for the same text. The temp0.wav file also play fine and sound nice, so I think the error has something to do with the transcribe process.
Looks like maybe it's not finding CUDA? It's weird, it's not skipping GPU because of lack of VRAM (that check is commented out).
Can you get into python and run the following?
import torch
torch.cuda.is_available()
torch.cuda.get_device_properties(0).total_memory
Please share the output, thanks!
I also pushed an update to the branch that does not load or use whisper, let's see if that helps get to the bottom of this.
Ah, I forgot to say, my driver is not CUDA compatible. After some more testing, I think it's a problem with Whisper on normal Windows installation. I am currently running with CPU on WSL instead, using my own fine-tunned model, and it seems to run okay for the moment.
Ah OK. I have only tested with NVidia GPU and CPU. I have not tried CPU with XTTS, but if it's working OK for you that's good news. I have not tried whisper on windows, only on linux, wsl and macos.
Tomorrow I will add logic to skip the whisper/transcript comparison if "--minratio 0" is passed on command line, that should hopefully sort everything out for you for now.
Thanks a lot! I've sucessfully converted a short ebook, and the quality is even better than some audiobooks I've bought.
Merry xmas, this was super easy to do. Committing to main now. Just set --minratio 0 and it will not try to use whisper.
Really glad to hear the quality has been good for you, please open issues for any other problems you have, or anything you think could be improved on.
I tried to convert one chapter of an epub with my fine-tunning model but received this error.
Also when I tried with the default xtts model using: python epub2tts.py --engine xtts --speaker "Damien Black" "mybook.epub" --start 5 --end 5 --skiplinks , I received the same error: