Open Daner-Wang opened 2 years ago
We eliminated some errors in the code and tried to reproduce the author's experiment with the parameters given by the author, however, the experimental results were far different from those in the paper. In author‘s paper, the proportion of remained FLOPs in deit-tiny is 49.23%, while in our reproducing the proportion is more than 70%, which is confusing. In addition, we also found that the author's calculation of FLOPS might be wrong in the given code.
Sorry for the late reply. We promise we will finish cleaning the code by the end of the next week and upload the full fixed codes.
Thank you for pointing out the issues in our codebase, our response to your problems are as follows:
Update Jul. 7th: new logs of DeiT-Tiny and DeiT-Base For DeiT-Tiny, it’s 49.23% in the paper, the reproduce result is 48.45% at epoch 9 in the screenshot. Which is not exactly the same but are very close.
For DeiT-Base, it’s 45.5% in the paper, the reproduce result has a 42% model at epoch 17, and a 50% model at epoch 19. Although these two results are not exactly the same numbers as it is in the paper(randomness), these two proves that the problem of cannot compressing to 50% FLOPs is not there. For DeiT-Base, using a different z-learning-rate schedule "5,10,15,20,25" can get the result.
The logs that are mentioned are under file log/.
依赖都已经安装,不存在依赖的问题,但是代码运行时还是各种乱七八糟的错,有些错误在编辑器里就能明显看到,希望上传能够正确运行的代码,以及作者所展示的最优结果的log文件。