MingSun-Tse / Collaborative-Distillation

[CVPR'20] Collaborative Distillation for Ultra-Resolution Universal Style Transfer (PyTorch)
MIT License
185 stars 23 forks source link

Question about testing on AdaIN #16

Closed tinkez closed 3 years ago

tinkez commented 3 years ago

Hello! Thank you so much for this repo! Really a great work! I just have two question that, 1) in your experiments, when you train a small encoder (SE) to replace the VGG-19 for the AdaIN task as mentioned in the paper, whether the 5 stage training for WCT in this repo is necessary for it since the original AdaIN stylized operation only did on relu4_1 features ? I mean is it correct that one can just write one Class nn.Module as the SE instead of 5 Class nn.Moudule you write in this repo, then replace the BE with SE, but all the original losses remain the same (Eq.2), plus one more loss (Eq.5) (which is the linear embedding loss to guide the intermediate layers of the SE), then train it ? 2) if just use SE+BD as mentioned in the paper on discussions, is it correct that the file model_kd2sd.py leaves unnecessary?

MingSun-Tse commented 3 years ago

Hi @ypaTinke , thanks for your interest!

  1. Your understanding is correct. We only use relu4_1 for AdaIN, no need to train the 5 stages as done for WCT.
  2. Yes. model_kd2sd.py is actually not used at all in our method. We evaluated kd2sd (i.e., applying the proposed kd to the small decoder training) because many asked us why not applying the proposed method to the decoder training. Based on our results, it does not help the small decoder training (results get worse actually, plausible reasons explained in the paper Sec.5).
tinkez commented 3 years ago

Thanks for your quick reply. It cleared up my confusions. Happy new year!