aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
445 stars 44 forks source link

UnboundLocalError #129

Closed iiiii2000 closed 6 months ago

iiiii2000 commented 7 months ago

I tried to convert one chapter of an epub with my fine-tunning model but received this error.

Not enough VRAM on GPU. Using CPU Loading model: C:\Users\PC\AppData\Local\tts/mymodel tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Computing speaker latents... Reading from 5 to 5 0%| | 0/12 [00:00<?, ?it/s]Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/12 [02:44<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 654, in main() File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 643, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 434, in read_book

  • str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value

Also when I tried with the default xtts model using: python epub2tts.py --engine xtts --speaker "Damien Black" "mybook.epub" --start 5 --end 5 --skiplinks , I received the same error:

Engine is TTS, model is tts_models/multilingual/multi-dataset/xtts_v2 tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Reading from 5 to 5 0%| | 0/24 [00:00<?, ?it/s] > Text splitted to sentences. ['Chapter', 'Three.', "Thursday evening at exactly seven o'clock, the doorbell sends melancholy notes through the house.", 'Jumping up from my leather armchair, I watch another player kill me a second before I can log out and conceal the screen behind oak-panel doors.', 'I grit my teeth and force my steps down the hall toward the front door.', "I'm not really looking forward to this.", "All I really know about Adam is that he has bad taste in style and that he's kind of abrasive."] Processing time: 85.79656481742859 Real-time factor: 2.7818834173827445 Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/24 [01:25<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 654, in main() File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 643, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts-main\epub2tts.py", line 434, in read_book

  • str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value
aedocw commented 7 months ago

It's possible your GPU does not have enough ram for the model. I'll add this output to the script soon, but in the mean time can you get into python and run:

import torch
torch.cuda.get_device_properties(0).total_memory

Let me know how much memory it shows. I'll also add an override to the memory limit so folks can experiment with that, as if you have a GPU it should be safe to use it regardless of how much video ram you have (i.e. worst case is it uses virtual ram and gets slow).

aedocw commented 7 months ago

If you can checkout the "ignore-vram" branch, it will print out available VRAM and what device it's using (CPU vs GPU) during a normal run.

I'm really not sure what happened with your second attempt there, that should have worked, I don't think I've ever seen "failed to load audio" there (I think that's from when whisper is trying to make a transcript of the audio chunk) :(

iiiii2000 commented 7 months ago

I have clone the ignore-vram branch, and the same thing happened even when I tried with a really short txt file (71 characters)

Namespace(sourcefile='C:\Users\PC\Downloads\Chapter Three.txt', engine='tts', xtts='C:\Users\PC\Downloads\best.wav', openai=None, model='mymodel', speaker='p335', scan=False, start=1, end=999, language='en', minratio=88, skiplinks=False, skipfootnotes=False, bitrate='69k', debug=True) Language selected: en Saving to Chapter Three-best.m4b Total characters: 71 Not enough VRAM on GPU. Using CPU Loading model: C:\Users\PC\AppData\Local\tts/mymodel tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Computing speaker latents... Reading from 1 to 1 0%| | 0/1 [00:00<?, ?it/s]I grit my teeth and force my steps down the hall toward the front door. Time to first chunck: 7.959530830383301 Received chunk 0 of audio length 65792 Time to first chunck: 13.617579698562622 Received chunk 0 of audio length 42240 Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/1 [00:13<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 659, in main() File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 648, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 439, in read_book

  • str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value

Also happened when using the default xtts model:

Namespace(sourcefile='C:\Users\PC\Downloads\Chapter Three.txt', engine='xtts', xtts=None, openai=None, model='tts_models/en/vctk/vits', speaker='Damien Black', scan=False, start=1, end=999, language='en', minratio=88, skiplinks=False, skipfootnotes=False, bitrate='69k', debug=True) Language selected: en Saving to Chapter Three-Damien Black.m4b Total characters: 71 Device: cpu Engine is TTS, model is tts_models/multilingual/multi-dataset/xtts_v2 tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded. Using model: xtts Reading from 1 to 1 0%| | 0/1 [00:00<?, ?it/s]text to read: I grit my teeth and force my steps down the hall toward the front door. Text splitted to sentences. ['I grit my teeth and force my steps down the hall toward the front door.'] Processing time: 15.116531610488892 Real-time factor: 2.7235547294685585 Error: Failed to load audio: ... Retrying (0 retries left) 0%| | 0/1 [00:15<?, ?it/s] Traceback (most recent call last): File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 659, in main() File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 648, in main mybook.read_book( File "C:\Users\PC\Documents\epub2tts\epub2tts.py", line 439, in read_book

  • str(ratio) ^^^^^ UnboundLocalError: cannot access local variable 'ratio' where it is not associated with a value

When I tried to use tts manually (using command-line tts command) with my own model, it worked without issues for the same text. The temp0.wav file also play fine and sound nice, so I think the error has something to do with the transcribe process.

aedocw commented 7 months ago

Looks like maybe it's not finding CUDA? It's weird, it's not skipping GPU because of lack of VRAM (that check is commented out).

Can you get into python and run the following?

import torch
torch.cuda.is_available()
torch.cuda.get_device_properties(0).total_memory

Please share the output, thanks!

I also pushed an update to the branch that does not load or use whisper, let's see if that helps get to the bottom of this.

iiiii2000 commented 7 months ago

Ah, I forgot to say, my driver is not CUDA compatible. After some more testing, I think it's a problem with Whisper on normal Windows installation. I am currently running with CPU on WSL instead, using my own fine-tunned model, and it seems to run okay for the moment.

aedocw commented 7 months ago

Ah OK. I have only tested with NVidia GPU and CPU. I have not tried CPU with XTTS, but if it's working OK for you that's good news. I have not tried whisper on windows, only on linux, wsl and macos.

Tomorrow I will add logic to skip the whisper/transcript comparison if "--minratio 0" is passed on command line, that should hopefully sort everything out for you for now.

iiiii2000 commented 6 months ago

Thanks a lot! I've sucessfully converted a short ebook, and the quality is even better than some audiobooks I've bought.

aedocw commented 6 months ago

Merry xmas, this was super easy to do. Committing to main now. Just set --minratio 0 and it will not try to use whisper.

Really glad to hear the quality has been good for you, please open issues for any other problems you have, or anything you think could be improved on.