CuriousAI / mean-teacher

A state-of-the-art semi-supervised method for image recognition
https://arxiv.org/abs/1703.01780
Other
1.56k stars 331 forks source link

Finally, which one should I take, teacher or student? #57

Closed Lilac-wgk closed 2 years ago

Lilac-wgk commented 2 years ago

Hi, I am very impressed with your research.

It may seem like a stupid, but I'm wondering what should I use the model for validating. I'm confusing with some knowledge distillation (KD) methodologies, which use the terms, teacher and student model. At first I thought those have different meaning (i.e., the word teacher in mean teacher and KD). However, I'm wondering why the EMA model (teacher model) has better performance than the student model, which is supervised-learned with ground truths. (and the slide also tells that the teacher model leads the student model.) Indeed, the experiments results show that the teacher model has better performances than the students.

1) How can I approach that the EMA-weighted model has better performance than the student model? 2) So, is it correct that the teacher model is using at the final system?

Thanks for reading.

tarvaina commented 2 years ago

The teacher is usually as good as or better than the student, so you should use the teacher.

On Mon 7. Mar 2022 at 17.31, Lilac-wgk @.***> wrote:

Hi, I am very impressed with your research.

It may seem like a stupid, but I'm wondering what should I use the model for validating. I'm confusing with some knowledge distillation (KD) methodologies, which use the terms, teacher and student model. At first I thought those have different meaning (i.e., the word teacher in mean teacher and KD). However, I'm wondering why the EMA model (teacher model) has better performance than the student model, which is supervised-learned with ground truths. (and the slide https://github.com/CuriousAI/mean-teacher/blob/546348ff863c998c26be4339021425df973b4a36/nips_2017_slides.pdf also tells that the teacher model leads the student model.) Indeed, the experiments results show that the teacher model has better performances than the students.

  1. How can I approach that the EMA-weighted model has better performance than the student model?
  2. So, is it correct that the teacher model is using at the final system?

Thanks for reading.

— Reply to this email directly, view it on GitHub https://github.com/CuriousAI/mean-teacher/issues/57, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAVM65PDM2JOVJQQN56X3U6YOLHANCNFSM5QDUHEZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

Lilac-wgk commented 2 years ago

Thanks a lot :)