chrundle / biprop

Identify a binary weight or binary weight and activation subnetwork within a randomly initialized network by only pruning and binarizing the network.
Apache License 2.0
48 stars 12 forks source link

Prune rate #1

Closed sfalkena closed 3 years ago

sfalkena commented 3 years ago

Hi,

So I have been exploring your implementation for a bit now, and first of all, let me say that I enjoyed working with this repo! It is very well structured and easy to follow!

So currently I am trying to build upon binary neural networks and experimenting a bit with the small scale network conv-4. I am varying the prune rate now to see it's effect. However, this is not what I would expect. As far as I understand, the prune rate is the " top k%" of the subnetworks that is kept. So if I set my prune rate to 0.4, only 40% of the weights are remaining. That said, I would expect to have my original network if I set the prune rate to 1 and with that, achieve my highest accuracy. However, I have done a couple of small tests (just a couple of epochs) and saw that prune rate of 0.1-0.5 gave better results than higher values.

My question: Is this behaviour expected? Is the problem just that with a bigger network, the network takes longer to train to get at the same accuracy level as the smaller subnetwork?

chrundle commented 3 years ago

Thanks for reaching out @sfalkena !

I understand that this behavior may seem strange at first but it is actually expected. Since the pruning and binarization is taking place on a randomly initialized network, the closer the prune rate is to 1 the more random the network is before binarization is applied to the remaining weights. The binarization scheme in biprop is designed to minimize the error of binarizing the weights of the pruned network and since the subnetwork is still essentially random (in the case where the prune rate is close to 1) the performance of the binarized subnetwork is still poor. Here is a figure from our paper (linked in the README) that demonstrates the average performance of biprop on the conv-2/4/6/8 models so you have an idea what to expect in your experiments:

biprop-convs

I hope this answers your question but feel free to let me know if you would like any additional information. Since this behavior is expected I will close the issue with this comment.