I am trying to train a TTS but I am wondering about the style of the speakers? My dataset contains multiple speakers with different speaking styles. Does the model retain the style for each voice or it uses only one style or it depends on the refer audio. For example In my dataset it contains Indian speaker who pauses nervously in conversation. When i train it with all the dataset and use one audio from that speaker and infer will it inhabit the nervous speaking style? Please I dearly wait for your response and thanks for this great repo
Please better to ask the question was can you train it with a narrator and conversational voice and get the two speaking style or I will need to train separate models to achieve that?
I am trying to train a TTS but I am wondering about the style of the speakers? My dataset contains multiple speakers with different speaking styles. Does the model retain the style for each voice or it uses only one style or it depends on the refer audio. For example In my dataset it contains Indian speaker who pauses nervously in conversation. When i train it with all the dataset and use one audio from that speaker and infer will it inhabit the nervous speaking style? Please I dearly wait for your response and thanks for this great repo