Open Andiami-Yusaka opened 8 months ago
Seems pretty standard. A few dozen model trainings and stuff like that is always present. Sometime a full second.
It could be that whatever dataset is being used is being split mid-sentence, leaving a pop.
I ran into the same issue, and i get these weird noises every 1 in 10 clips or so.
From som quick testing i believe it is voicefixer that creates those artifacts. Disabling voicefixer seems to eliminate the weird sound, but ofcourse the overall quality becomes worse.
Instead, i just trimmed the wav files by 0.1 seconds at the endings, eliminating the strange noise without trimming any actual spoken voice as there usually is some few milliseconds leftover at the end of each clip.
I used scipy for this: https://gist.github.com/JeffPlsFix/f4c54f68e8a9b3d4c8093dccd7ad0664
Hi there, I converted the below text using one sample voice, and the below settings. The quality of final voice is great; however, there is a strange noise at the end of sentence. It just appears in some sentences. I appreciate it if you assist me to set the proper configuration and resolve this issue.
Text: Additionally, research findings on spatial puzzles were updated and further research was conducted. Documentation for the final end goal of the interactive shop interface was also started.
Setting _text_split candidates=1 output_dir=results seed=50 quiet=no vocoder=BigVGAN_Base models_dir=
disable_redaction=no
batch_size=
diff_checkpoint=
ar_checkpoint=
speed=original_tortoise
multi_output_regenerate
ooutput=result
device=
low_vram=no
no_cache=no
clvp_checkpoint=
preset=standard
tuning=condfree
gvoicefixer=yes
Voice https://github.com/152334H/tortoise-tts-fast/assets/129772750/782853af-664d-47ef-8b1c-64f2f0b6a684