Reproduce the results of the paper

andreasveit / convnet-aig

PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs

BSD 3-Clause "New" or "Revised" License

185 stars 28 forks source link

Reproduce the results of the paper #4

Open Goingqs opened 6 years ago

Goingqs commented 6 years ago

Can this code reproduce the results of the paper？I got 24.61% top-1 error.

Goingqs commented 6 years ago

I set the target rate 0.7 and follow the standard ResNet training procedure.

andreasveit commented 6 years ago

Yes, the code should be able to produce the results from the paper. I assume you trained a model based upon ResNet-50? Can you please provide more details? For example, what is the average execution rate of your trained model? Further, what batch-size did you use? I suggest training with a batch-size of 256 (thats the standard for ResNets and is also used in the paper) or even larger, since the effective batch-size per layer is lower with low execution rates.

Goingqs commented 6 years ago

average execution rate is 0.8585，my batch size is 2048. I remove all the fc1bn.

I find that fc1bn will degrade result. top1 error is 25.324 with fc1bn.

Goingqs commented 6 years ago

I get 25.32 top-1 error and the average execution rate is 0.8452. The batch size is 512 without fc1bn. I think larger batch size is better. So what's the problem? Please help me ~~

PerdonLiu commented 5 years ago

I can not reproduce the result, either. On CIFAR-10, I used exactly the same setting as paper did(batch-size 256, epoch 350, target rate 0.7) but got 6.68% top-1 error.

adrianloy commented 5 years ago

@Goingqs @PerdonLiu the readme says "Specifically, for the results in the paper the following target rate schedules are used for ResNet 50: [1, 1, 0.8, 1, t, t, t, 1, t, t, t, t, t, 1, 0.7, 1] for t in [0.4, 0.5, 0.6, 0.7] " Did you do that, or use target rate 0.7 for all gates? I do not understand how this code allows to have different target rates per layer, the arg parser expects a float and I also cant see adjustment for layer-specifid target rates in other parts of the code where I would expect it.