Closed Sharpiless closed 2 years ago
This approach leads to a huge computational cost. Because:
The above data comes from the paper. And I now have doubts about whether the paper can be reproduced.
As mentioned in issue:https://github.com/NVlabs/DeepInversion/issues/10
Complete code may be the best way to answer our questions. I wonder if anyone continues to maintain the repo. Or if there is a follow-up open source plan.
Hi @Sharpiless we were a bit occupied and just looped back to the issues. The generation of ADI samples are an iterative process, i.e., new samples are generated during distillation, when student gradually picks up the knowledge, on top of existing DI images as a starting point. The student network is constantly evolving, and as it picks up knowledge, we iteratively at certain steps, freeze the student, generate ADI images, mixing it into DI pool, and the unfreeze student and continue the distillation. The gradually expands the dataset coverage and facilitate more distillation.
Thank you for your work. I notice that the student model does not use pre-trained weights when using the ADI method in the code to optimise inputs. And the optimizer is only for input tensors. This means that the weights of the student model are not optimized and remain initially distributed.
Does this result in the student model not being able to output appropriate logits to measure JS distance throughout the training process?