Closed cszy98 closed 3 years ago
Hi, it's just like the knowledge distillation in classification that although you have the one-hot label, you still enforce the model to learn from the teacher model. We find learning from a well-pretrained teacher to mimic its output distribution can achieve better results compared with directly using the ground truth.
Thank you for your reply.
Hi, great work! The pre-trained ESRGAN_x4 is trained in a supervised way(the dataset has paired HR and LR images), so why do we use the pre-trained ESRGAN model as the teacher model instead of directly using ground truth as the supervision of the network searching process?