V3 Style, Inflection, Pacing Not Consistent

JarodMica / audiobook_maker

GNU General Public License v3.0

293 stars 47 forks source link

V3 Style, Inflection, Pacing Not Consistent #70

Open FJCCOMMISH opened 5 days ago

FJCCOMMISH commented 5 days ago

Even with the same text (sentences) and settings, multiple generations result in radically different pacing, style, inflection.

Is there a way to expose/control more settings, see these settings, ensure consistent output (reads)?

This file contains audio of the same text with audio generated with the same models and settings: https://we.tl/t-0M8VeAMAt0

Note the differences in style, inflection, pronunciation, pacing.

JarodMica commented 5 days ago

Is there a way to expose/control more settings, see these settings, ensure consistent output (reads)?

Sure! I'll expose seed as a controllable setting. The variability is a natural outcome of Tortoise (or any neural net based TTS) and seed will keep it consistent across generations for the same inputs.