Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 705 forks source link

can we calculate text'phoneme duration time form StochasticDurationPredictor or DurationPredictor ? #591

Open CasonTsai opened 1 month ago

CasonTsai commented 1 month ago

amazing work! excuse me, how to extract text'phoneme duration time form StochasticDurationPredictor or DurationPredictor ? I want to extract the delay time of the phoneme corresponding to each piece of text。

Plachtaa commented 1 month ago

You may use the attn output from the forward method as the phoneme-audio alignment result

CasonTsai commented 1 month ago

You may use the attn output from the forward method as the phoneme-audio alignment result

thanks for replying,i will experience in inference

CasonTsai commented 1 month ago

You may use the attn output from the forward method as the phoneme-audio alignment result

hello,i print the attn output in inferncing the model ,but I don’t know the correspondence between phoneme duration time and attn output of text,thank you for your reply image image