Regarding accuracy of the Scratch B models in the paper

Eric-mingjie / rethinking-network-pruning

Rethinking the Value of Network Pruning (Pytorch) (ICLR 2019)

MIT License

1.51k stars 293 forks source link

Regarding accuracy of the Scratch B models in the paper #9

Closed aashish-kumar closed 5 years ago

aashish-kumar commented 5 years ago

Hello,

Thanks for an interesting paper. I was looking at the accuracy of Scratch B models compared to big-unpruned networks and it seems Scratch B is performing better than unpruned networks most of the time. This seems to be counter intuitive as the bigger network if could be trained effectively should outperform smaller networks. Do you think the difference is statistically significant?

Thanks

Eric-mingjie commented 5 years ago

Hi,

Thanks for your interests in our paper and code. Indeed sometimes Scratch B is performing better than unpruned networks. This could be due to the following reasons:

Prune ratio is relatively small.
Standard training schedule may not be enough for convergence.

For CIFAR-10 experiments, we report the average of five runs. For ImageNet, each result is from one run. It is safe to say that the difference in CIFAR-10 is statistically significant.