Tomiinek / Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
MIT License
826 stars 157 forks source link

about µ and variances σ #83

Closed xynulgm6020 closed 1 year ago

xynulgm6020 commented 1 year ago

"We computed means µ and variances σ of audio durations of groups corresponding to examples with the same transcript lengths. Then we removed those with durations outside the interval (µ – 3σ, µ + 3σ)." Here "groups" means some data of the same language or data of the whole dataset in other words µ and variances σ of the specific language or of the whole dataset

Tomiinek commented 1 year ago

Hello, thank you for the question!

of groups corresponding to examples with the same transcript lengths

It means that the groups are made of samples that have transcripts with the same length (one group for 4 character, 5 characters, another for 6 characters ... 190 characters); I guess for all languages together (but I do not remember very well :pensive: ).

The goal was to remove outliers, for example a sample with audio duration 10s and transcript length 3 characters, but keep them if the duration and length is in tact, e.g. 10s and 100 characters.

xynulgm6020 commented 1 year ago

different language has different text length for the same audio, so I'm very confused

xynulgm6020 commented 1 year ago

different language has different text length for the same length of different audios, so i'm

xynulgm6020 commented 1 year ago

Did you delete the silence of the audio

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.