OlaWod / FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
MIT License
602 stars 111 forks source link

SR Process #49

Closed Shmuel-Gruel closed 1 year ago

Shmuel-Gruel commented 1 year ago

Hi again,

I am finding that the wavs created by SR preprocess slightly differ in length from the original. Seems to be different randomly up to about 0.01 seconds. Is this a problem?

And, did you find it is not useful to apply horizontal SR?

Thank you a lot

OlaWod commented 1 year ago

I don't think it a problem. Actually, apply stft to a wav and then istft will get a slightly different length wav too, because the original wav length might not be divisible by stft hop length. In SR augmentation we only use vertical SR, which keeps the content information but changes speaker information. The horizontal SR changes the speaking rate, which is related to content. It's a "by the way" side product.

Shmuel-Gruel commented 1 year ago

Okay thank you, I see it is from the stft. I saw there was some similar question in #41 so I wanted to check.