KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
MIT License
290 stars 31 forks source link

Question about voice speaker Style #12

Open MavisHoot opened 3 months ago

MavisHoot commented 3 months ago

I am trying to train a TTS but I am wondering about the style of the speakers? My dataset contains multiple speakers with different speaking styles. Does the model retain the style for each voice or it uses only one style or it depends on the refer audio. For example In my dataset it contains Indian speaker who pauses nervously in conversation. When i train it with all the dataset and use one audio from that speaker and infer will it inhabit the nervous speaking style? Please I dearly wait for your response and thanks for this great repo

MavisHoot commented 3 months ago

Please better to ask the question was can you train it with a narrator and conversational voice and get the two speaking style or I will need to train separate models to achieve that?