Closed bojtalepenye closed 6 months ago
This does look like a bug, sorry about that. Can you paste the full exact command you are running?
Also, can you try that same command but with --minratio 0
on the command line as well?
I think I know what's happening but I don't think I'll be able to look into it until later today or tomorrow.
I looked into your post here: https://github.com/aedocw/epub2tts/issues/31
And thought that maybe the file is causing this. Then I converted the epub to txt like you said, but the issue still persisted. Then I cut down the book to like 4, or 5 sentences literally, but still, the error remained.
Here is the full command with the --min-ratio 0
:
ubuntu@LENOVO-LEGION:~/epub2tts/The Final Irony$ python3 ../epub2tts.py "/mnt/c/Users/lepen_ztxqwq1/Downloads/The Final Irony.txt" --minratio 0 --speaker 307
Namespace(sourcefile='/mnt/c/Users/lepen_ztxqwq1/Downloads/The Final Irony.txt', engine='tts', xtts=None, openai=None, model='tts_models/en/vctk/vits', speaker='307', scan=False, start=1, end=999, language='en', minratio=0, skiplinks=False, skipfootnotes=False, sayparts=False, bitrate='69k', debug=False, export=None, no_deepspeed=False, skip_cleanup=False, cover=None)
Language selected: en
Saving to The Final Irony-307.m4b
Total characters: 255
Engine is TTS, model is tts_models/en/vctk/vits
> tts_models/en/vctk/vits is already downloaded.
> Using model: vits
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:0
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:None
| > fft_size:1024
| > power:None
| > preemphasis:0.0
| > griffin_lim_iters:None
| > signal_norm:None
| > symmetric_norm:None
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:None
| > pitch_fmax:None
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:1.0
| > clip_norm:True
| > do_trim_silence:False
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
> initialization of speaker-embedding layers.
Reading from 1 to 1
0%| | 0/3 [00:00<?, ?it/s] > Text splitted to sentences.
['Chapter 1', 'It was a simple mission that caused all of this.']
Error: '307' ... Retrying (1 retries left)
> Text splitted to sentences.
['Chapter 1', 'It was a simple mission that caused all of this.']
Error: '307' ... Retrying (0 retries left)
0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 834, in <module>
main()
File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 821, in main
mybook.read_book(
File "/home/ubuntu/epub2tts/The Final Irony/../epub2tts.py", line 572, in read_book
f"Something is wrong with the audio ({ratio}): {tempwav}"
UnboundLocalError: local variable 'ratio' referenced before assignment
ubuntu@LENOVO-LEGION:~/epub2tts/The Final Irony$
By the way, I forgot to mention, that one voice works. And that is the default one. If I don't specify a voice with --speaker
, then it works. But I really want to use some other voice. So there is that.
OK knowing the full command is super helpful, this is a problem with choosing a speaker with TTS, should be pretty quick to fix.
I just realized what's going on. You need to specify speaker as "p307". Without the "p", it ends up trying to use a speaker model that does not actually exist.
Your command should look something like epub2tts sample.txt --speaker p307
(and it should work just as well with an epub).
I just realized what's going on. You need to specify speaker as "p307". Without the "p", it ends up trying to use a speaker model that does not actually exist.
Your command should look something like
epub2tts sample.txt --speaker p307
(and it should work just as well with an epub).
Thank you very much, how did I not notice that. It works now, but I have another question if you don't mind. :D In the readme file under usage, there is link to another repository: https://github.com/rejuce/CoquiTTS_XTTS_Examples There are a lot of voices there. How could I use for example the "Abrahan Mack.wav" voice?
Those are Coqui Studio voices, you would use that one for instance this way:
epub2tts mybook.txt --engine xtts --speaker "Abrahan Mack"
Be aware though that without a GPU, using XTTS is likely going to be too slow to be usable.
Glad we sorted this out!
When does this happen?
Every single time. Why? I have no idea. But what I know is that no matter what vocie I try, I always get this error: local variable 'ratio' referenced before assignment
Possible solution request?
Maybe this only happens for me. Maybe I am doing something wrong. It would be a big help if somebody explained to me why this happens and how can I solve it.
Here is the error: