Closed 18445864529 closed 2 years ago
@18445864529 — from @tlikhomanenko:
Supervised slimIPL is just a supervised model on top of which we latter apply slimIPL with unlabeled data involved. So in that respect it is just standar supervised model, nothing else. Our supervised model is conv layer with stride 3 and kernel 7 followed by vanilla transformer (with relative positional embedding) and linear layer to map to output tokens. This model is trained to output letter tokens with CTC loss. This supervised baseline is SOTA compared to other papers which use the same data setting (supervised 10h or 100h) with a another architecture and/or loss and/or token set. Hope, this clarifies what we meant in the paper
Also in Table 1 https://arxiv.org/pdf/2010.11524.pdf we specify "supervised baselines" and "Ours", while in semi-sup setting, table 2 we write "slimIPL" and "semi/un-supervised setting"
Question
In Table1 of the paper, the authors claim the supervised slimIPL achieves SoTA performance. What is the difference between it and other supervised methods? Because it seems to me that slimIPL is a training strategy applied to unlabeled data, but for labeled data, it does not apply any further tricks. So why are its supervised results much better than its baselines'?
Thank you in advance for your reply!