Closed puffdrum closed 1 year ago
Hey, thanks for taking interest in our work. We used the exact same re-training strategy as the pre-training stage but on full precision as we were facing some NaN loss issues. The results reported in the paper were trained on 8 32GB V100s.
Really thanks for your reply.
Hi, first thanks for your great work. I am trying to reproduce your results in ViT-Slim. I follow the operations in your paper. I can run through whole process in ViT-Slim but I just cannot get results as good as you present in the paper. For me, the results drop all about 1%. I am wondering whether you have some tricks when retraining pruned ViT-Slim-S?