Closed Shmuel-Gruel closed 1 year ago
I don't think it a problem. Actually, apply stft to a wav and then istft will get a slightly different length wav too, because the original wav length might not be divisible by stft hop length. In SR augmentation we only use vertical SR, which keeps the content information but changes speaker information. The horizontal SR changes the speaking rate, which is related to content. It's a "by the way" side product.
Okay thank you, I see it is from the stft. I saw there was some similar question in #41 so I wanted to check.
Hi again,
I am finding that the wavs created by SR preprocess slightly differ in length from the original. Seems to be different randomly up to about 0.01 seconds. Is this a problem?
And, did you find it is not useful to apply horizontal SR?
Thank you a lot