changlin31 / DNA

(CVPR 2020) Block-wisely Supervised Neural Architecture Search with Knowledge Distillation
235 stars 35 forks source link

The models are different in paper, code and trained models #4

Closed 5663015 closed 4 years ago

5663015 commented 4 years ago

I mainly focus on DNA-c model, but I found the model structure in the paper, architecture defined in code, and the released trained model are all different. Which model is the best? Could you provide a definite network structure? Thank you!

wanggrun commented 4 years ago

I mainly focus on DNA-c model, but I found the model structure in the paper, architecture defined in code, and the released trained model are all different. Which model is the best? Could you provide a definite network structure? Thank you!

Thank you for the careful reading. The published code (i.e., the architecture) matches the released weights. However, the architecture is not the one we mentioned in the paper due to our carelessness -- we uploaded a different model by accident.

Actually, our method has found a series of architectures with similar model size and performance, i.e., the model we previously uploaded and the model we mentioned in the paper share the same parameter size and accuracy.

Now, the mistake has been corrected. You can download the new DNA_c.pth.tar and update the model.py for validation.

5663015 commented 4 years ago

Thank you very much!