After reading your paper, I am wondering how effective it is the adversarial loss.
In Table 1, row 4 and row 6 could illustrate the effectiveness of adversarial loss on CIFAR10. Have you performed a similar ablation study on a large dataset such as ImageNet?
From my understanding, similarity loss and adversarial loss "almost" perform the same thing, make student network act like the teacher networks. So, is it really necessary to use adversarial loss? In actual applications, adversarial loss could be a little trouble to be implemented and unstable.
One more question, for different teacher networks, the same layer may have different semantic information, so is it reasonable to make student network learn from them?
Hi, thanks for your wonderful work!
After reading your paper, I am wondering how effective it is the adversarial loss.
In Table 1, row 4 and row 6 could illustrate the effectiveness of adversarial loss on CIFAR10. Have you performed a similar ablation study on a large dataset such as ImageNet?
From my understanding, similarity loss and adversarial loss "almost" perform the same thing, make student network act like the teacher networks. So, is it really necessary to use adversarial loss? In actual applications, adversarial loss could be a little trouble to be implemented and unstable.