Details about adversarial losses in the code

NVlabs / DeepInversion

Official PyTorch implementation of Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion (CVPR 2020)

Other

479 stars 77 forks source link

Details about adversarial losses in the code #20

Closed Sharpiless closed 1 year ago

Sharpiless commented 2 years ago

Thank you for your work. I notice that the student model does not use pre-trained weights when using the ADI method in the code to optimise inputs. And the optimizer is only for input tensors. This means that the weights of the student model are not optimized and remain initially distributed.

Does this result in the student model not being able to output appropriate logits to measure JS distance throughout the training process?

Sharpiless commented 2 years ago

This approach leads to a huge computational cost. Because:

For cifar every time a batch (256 in total) of data is generated, noise data is optimized for 2000 iterations.
For imagenet every time a batch (1216 in total) of data is generated, noise data is optimized for 20k iterations over 2h.

The above data comes from the paper. And I now have doubts about whether the paper can be reproduced.

As mentioned in issue：https://github.com/NVlabs/DeepInversion/issues/10

Sharpiless commented 2 years ago

Complete code may be the best way to answer our questions. I wonder if anyone continues to maintain the repo. Or if there is a follow-up open source plan.

hongxuyin commented 1 year ago

Hi @Sharpiless we were a bit occupied and just looped back to the issues. The generation of ADI samples are an iterative process, i.e., new samples are generated during distillation, when student gradually picks up the knowledge, on top of existing DI images as a starting point. The student network is constantly evolving, and as it picks up knowledge, we iteratively at certain steps, freeze the student, generate ADI images, mixing it into DI pool, and the unfreeze student and continue the distillation. The gradually expands the dataset coverage and facilitate more distillation.