Open Goingqs opened 6 years ago
I set the target rate 0.7 and follow the standard ResNet training procedure.
Yes, the code should be able to produce the results from the paper. I assume you trained a model based upon ResNet-50? Can you please provide more details? For example, what is the average execution rate of your trained model? Further, what batch-size did you use? I suggest training with a batch-size of 256 (thats the standard for ResNets and is also used in the paper) or even larger, since the effective batch-size per layer is lower with low execution rates.
average execution rate is 0.8585,my batch size is 2048. I remove all the fc1bn.
I find that fc1bn will degrade result. top1 error is 25.324 with fc1bn.
I get 25.32 top-1 error and the average execution rate is 0.8452. The batch size is 512 without fc1bn. I think larger batch size is better. So what's the problem? Please help me ~~
I can not reproduce the result, either. On CIFAR-10, I used exactly the same setting as paper did(batch-size 256, epoch 350, target rate 0.7) but got 6.68% top-1 error.
@Goingqs @PerdonLiu the readme says "Specifically, for the results in the paper the following target rate schedules are used for ResNet 50: [1, 1, 0.8, 1, t, t, t, 1, t, t, t, t, t, 1, 0.7, 1] for t in [0.4, 0.5, 0.6, 0.7] " Did you do that, or use target rate 0.7 for all gates? I do not understand how this code allows to have different target rates per layer, the arg parser expects a float and I also cant see adjustment for layer-specifid target rates in other parts of the code where I would expect it.
Can this code reproduce the results of the paper?I got 24.61% top-1 error.