Jungjee / RawNet

Official repository for RawNet, RawNet2, and RawNet3
MIT License
357 stars 55 forks source link

Test-Time Augmentation (TTA) not clear #11

Closed saurabh-kataria closed 3 years ago

saurabh-kataria commented 3 years ago

It is not clear from paper https://arxiv.org/pdf/2004.00526.pdf what does it mean when TTA is not used? 1) Does that mean just extracting one embedding from a random crop of audio? or 2) it means TTA with 0% overlap? or 3) part2 of TTA in this work: https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf

Any response will be appreciated. P.S. I think in object recognition research, it means literal augmentation.

Jungjee commented 3 years ago

When TTA is not applied, we input the whole utterance without any duration modification. Thus, before GRU, the number of frame-level representations is dependent on the duration of an utterance. After GRU, an utterance-level representation is obtained.

saurabh-kataria commented 3 years ago

It is clear now, thanks.