flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.35k stars 1.02k forks source link

What is the supervised algorithm in the slimIPL paper? #994

Closed 18445864529 closed 2 years ago

18445864529 commented 2 years ago

Question

In Table1 of the paper, the authors claim the supervised slimIPL achieves SoTA performance. What is the difference between it and other supervised methods? Because it seems to me that slimIPL is a training strategy applied to unlabeled data, but for labeled data, it does not apply any further tricks. So why are its supervised results much better than its baselines'?

Thank you in advance for your reply!

jacobkahn commented 2 years ago

@18445864529 — from @tlikhomanenko:

Supervised slimIPL is just a supervised model on top of which we latter apply slimIPL with unlabeled data involved. So in that respect it is just standar supervised model, nothing else. Our supervised model is conv layer with stride 3 and kernel 7 followed by vanilla transformer (with relative positional embedding) and linear layer to map to output tokens. This model is trained to output letter tokens with CTC loss. This supervised baseline is SOTA compared to other papers which use the same data setting (supervised 10h or 100h) with a another architecture and/or loss and/or token set. Hope, this clarifies what we meant in the paper

Also in Table 1 https://arxiv.org/pdf/2010.11524.pdf we specify "supervised baselines" and "Ours", while in semi-sup setting, table 2 we write "slimIPL" and "semi/un-supervised setting"