Strange sound at the end of voice

Andiami-Yusaka commented 8 months ago

Hi there, I converted the below text using one sample voice, and the below settings. The quality of final voice is great; however, there is a strange noise at the end of sentence. It just appears in some sentences. I appreciate it if you assist me to set the proper configuration and resolve this issue.

Text: Additionally, research findings on spatial puzzles were updated and further research was conducted. Documentation for the final end goal of the interactive shop interface was also started.

Setting _text_split candidates=1 output_dir=results seed=50 quiet=no vocoder=BigVGAN_Base models_dir= disable_redaction=no batch_size= diff_checkpoint= ar_checkpoint= speed=original_tortoise multi_output_regenerate ooutput=result device= low_vram=no no_cache=no clvp_checkpoint= preset=standard tuning=condfree gvoicefixer=yes

Voice https://github.com/152334H/tortoise-tts-fast/assets/129772750/782853af-664d-47ef-8b1c-64f2f0b6a684

78Alpha commented 8 months ago

Seems pretty standard. A few dozen model trainings and stuff like that is always present. Sometime a full second.

It could be that whatever dataset is being used is being split mid-sentence, leaving a pop.

JeffPlsFix commented 7 months ago

I ran into the same issue, and i get these weird noises every 1 in 10 clips or so.

From som quick testing i believe it is voicefixer that creates those artifacts. Disabling voicefixer seems to eliminate the weird sound, but ofcourse the overall quality becomes worse.

Instead, i just trimmed the wav files by 0.1 seconds at the endings, eliminating the strange noise without trimming any actual spoken voice as there usually is some few milliseconds leftover at the end of each clip.

I used scipy for this: https://gist.github.com/JeffPlsFix/f4c54f68e8a9b3d4c8093dccd7ad0664

152334H / tortoise-tts-fast

Strange sound at the end of voice #129