bojtalepenye commented 6 months ago

When does this happen?

Every single time. Why? I have no idea. But what I know is that no matter what vocie I try, I always get this error: local variable 'ratio' referenced before assignment

Possible solution request?

Maybe this only happens for me. Maybe I am doing something wrong. It would be a big help if somebody explained to me why this happens and how can I solve it.

Here is the error:

Number of chapters to read: 18
Saving to The Final Irony-307.m4b
Total characters: 253122
Engine is TTS, model is tts_models/en/vctk/vits
 > tts_models/en/vctk/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.

[...]

Traceback (most recent call last):
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 834, in <module>
    main()
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 821, in main
    mybook.read_book(
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 572, in read_book
    f"Something is wrong with the audio ({ratio}): {tempwav}"
UnboundLocalError: local variable 'ratio' referenced before assignment
ubuntu@LENOVO-LEGION:~/epub2tts/The Final Irony$

aedocw commented 6 months ago

This does look like a bug, sorry about that. Can you paste the full exact command you are running?

Also, can you try that same command but with --minratio 0 on the command line as well?

I think I know what's happening but I don't think I'll be able to look into it until later today or tomorrow.

bojtalepenye commented 6 months ago

I looked into your post here: https://github.com/aedocw/epub2tts/issues/31 And thought that maybe the file is causing this. Then I converted the epub to txt like you said, but the issue still persisted. Then I cut down the book to like 4, or 5 sentences literally, but still, the error remained. Here is the full command with the --min-ratio 0:

ubuntu@LENOVO-LEGION:~/epub2tts/The Final Irony$ python3 ../epub2tts.py "/mnt/c/Users/lepen_ztxqwq1/Downloads/The Final Irony.txt" --minratio 0 --speaker 307
Namespace(sourcefile='/mnt/c/Users/lepen_ztxqwq1/Downloads/The Final Irony.txt', engine='tts', xtts=None, openai=None, model='tts_models/en/vctk/vits', speaker='307', scan=False, start=1, end=999, language='en', minratio=0, skiplinks=False, skipfootnotes=False, sayparts=False, bitrate='69k', debug=False, export=None, no_deepspeed=False, skip_cleanup=False, cover=None)
Language selected: en
Saving to The Final Irony-307.m4b
Total characters: 255
Engine is TTS, model is tts_models/en/vctk/vits
 > tts_models/en/vctk/vits is already downloaded.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > initialization of speaker-embedding layers.
Reading from 1 to 1
  0%|                                                                                                                                               | 0/3 [00:00<?, ?it/s] > Text splitted to sentences.
['Chapter 1', 'It was a simple mission that caused all of this.']
Error: '307' ... Retrying (1 retries left)
 > Text splitted to sentences.
['Chapter 1', 'It was a simple mission that caused all of this.']
Error: '307' ... Retrying (0 retries left)
  0%|                                                                                                                                               | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 834, in <module>
    main()
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 821, in main
    mybook.read_book(
  File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 572, in read_book
    f"Something is wrong with the audio ({ratio}): {tempwav}"
UnboundLocalError: local variable 'ratio' referenced before assignment
ubuntu@LENOVO-LEGION:~/epub2tts/The Final Irony$

By the way, I forgot to mention, that one voice works. And that is the default one. If I don't specify a voice with --speaker, then it works. But I really want to use some other voice. So there is that.

aedocw commented 6 months ago

OK knowing the full command is super helpful, this is a problem with choosing a speaker with TTS, should be pretty quick to fix.

aedocw commented 6 months ago

I just realized what's going on. You need to specify speaker as "p307". Without the "p", it ends up trying to use a speaker model that does not actually exist.

Your command should look something like epub2tts sample.txt --speaker p307 (and it should work just as well with an epub).

bojtalepenye commented 6 months ago

I just realized what's going on. You need to specify speaker as "p307". Without the "p", it ends up trying to use a speaker model that does not actually exist.

Your command should look something like epub2tts sample.txt --speaker p307 (and it should work just as well with an epub).

Thank you very much, how did I not notice that. It works now, but I have another question if you don't mind. :D In the readme file under usage, there is link to another repository: https://github.com/rejuce/CoquiTTS_XTTS_Examples There are a lot of voices there. How could I use for example the "Abrahan Mack.wav" voice?

aedocw commented 6 months ago

Those are Coqui Studio voices, you would use that one for instance this way:

epub2tts mybook.txt --engine xtts --speaker "Abrahan Mack"

Be aware though that without a GPU, using XTTS is likely going to be too slow to be usable.

Glad we sorted this out!

aedocw / epub2tts

UnboundLocalError every time #219

When does this happen?

Possible solution request?