NVlabs / DeepInversion

Official PyTorch implementation of Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion (CVPR 2020)
Other
488 stars 79 forks source link

Why the student model is *pretrained* for ImageNet for ADI? #10

Closed MingSun-Tse closed 4 years ago

MingSun-Tse commented 4 years ago

Hi, thanks for your great work! I noticed the student network is pretrained for the ADI experiment on ImageNet. This is quite strange since for data-free knowledge distillation, the goal is to train a student with the synthetic samples. If you already have a pretrained student, the problem does not exist from the beginning.

Meanwhile, for the cifar10 experiment, the student is not pretrained, which I think should be the normal setting though. But there is an inconsistency here. Could you explain a little what makes you choose different schemes for cifar10 and ImageNet? Thanks!

pamolchanov commented 4 years ago

Hi, thanks for looking into details. The scripts provided serve the task of sharing toy examples, and we wanted to show it working so used a different model as a student that was trained to provide some meaningful result. In the original paper, ADI is always used with student that is undergoing the training procedure and is not pertained.

MingSun-Tse commented 4 years ago

Hi, great thanks for the update! How about the complete data-free KD code? Some people got questions about the details. When will you release that?

Sharpiless commented 3 years ago

Hi, thanks for looking into details. The scripts provided serve the task of sharing toy examples, and we wanted to show it working so used a different model as a student that was trained to provide some meaningful result. In the original paper, ADI is always used with student that is undergoing the training procedure and is not pertained.

"ADI is always used with student that is undergoing the training procedure and is not pertained." This approach leads to a huge computational cost. Because every time a batch (256 in total) of data is generated, it goes through 2000 iterations of training. And I now have doubts about whether the paper can be reproduced.