Open TomLan42 opened 4 years ago
Thanks for your attention for our paper. 1024 is a typo, it should be the number of classes and I have corrected this. Thanks for pointing it out. This network structure has only one FC layer. And I checked with HWGQ prototxt. It is mostly the same.
I tried to train the VGG-Auto-A but I can only reach an accuracy of 90.530%. The training script I used is here. I am curious if there's anything that went wrong with my reproducing?
I am also curious about how the BNN with uniform expansion can perform better than ABC-Net, even with a smaller size. If simple expansion can improve accuracy, why bother using binary filters to do linear approximation, which is what ABC-Net does.
This is the work I have done in Huawei Noak' Ark lab. Recently I am also reproducing these work. And the training script I use is this.
The main difference probably lies in dataset normalization and weight decay. I train 200 epochs and learning rate decays at epoch 60, 120, 180, weight decay is 0.0001 and I get 91.12. And I also train 200 epochs and learning rate decays at epoch 120, 240, 360, weight decay is 0.0001 and I get 91.71 which is a little below the accuracy in the paper. I am working on reproducing the results and I will update the train script when the work is done.
For ImageNet experiments, I have already reproduced the work in the paper. But I probably won't open source this part. But actually, the simple training script provided by pytorch torchvision will do the trick.
For your questions about ABC-Net, we do not reproduce their work, so we simple use their numbers in the paper which means perhaps with better training tricks,they can actually achieve higher accuracy.
And I believe the design purpose of expansion the channel in binary network or increasing the number of binary basis is pretty the same. They both are the useful ways to increase expression ability in BNNs.
I tried to train the VGG-Auto-A but I can only reach an accuracy of 90.530%. The training script I used is here. I am curious if there's anything that went wrong with my reproducing?
I noticed that in your training script, you used nn.CrossEntropyLoss
as the loss function which already combines nn.LogSoftmax() and nn.NLLLoss() in one single class. You should remove nn.LogSoftMax()
in this line of your VGGSmall.
I tried to train the VGG-Auto-A but I can only reach an accuracy of 90.530%. The training script I used is here. I am curious if there's anything that went wrong with my reproducing?
I noticed that in your training script, you used
nn.CrossEntropyLoss
as the loss function which already combines nn.LogSoftmax() and nn.NLLLoss() in one single class. You should removenn.LogSoftMax()
in this line of your VGGSmall.
Thanks for the reminder. I corrected the code and trained it again. I follow the learning rate schedule in the paper and change weight decaying to 0.0001. I can now get an accuracy of 90.970%, which is closer to the result of 91.12%, but still far from what is presented in the paper.
It makes sense to improve the performance of a network by expanding the width. But simple uniform expansion or non-uniform expansion in a layer-by-layer base could improve a binary neural network in such a large margin seems counter-intuitive in first glance. I am not sure if this is the accurate reimplementation to the original paper, perhaps there are some other tricks not presented in this repo?
Thanks again for all your patient answers, I really appreciate it. I am just very curious about the way to improve the performance of BNN. I have been researching on BNN for quite a while, when I first saw this paper, I feel very surprised that such simple uniform expansion can make BNN approximate the accuracy of an FP32 network. It would be so great if I can reproduce the good results in the paper using this simplistic yet elegant method. So, need your guys‘ enlightment XD. Thanks again!
I would like to know when you are calculating the FLOPs, did you count the first convolutional layer as full precision? In the paper, you reported that FLOPs of the original uniform-1 is 13M. But the first conv layer already takes up to 40M FLOPS according to my calculation.
The first layer is full-precision which means the FLOPs is calculated as 31283332*32=3538944=3.5M. We followed BirealNet, the calculation of ResNet18-Uniform1 and BirealNet-18 is quite close, but we quantizate pointwise conv in the residual while Bireal did not.
Is the memory size calculated based on a formula or measured?
Could you please release your train.py scirpt. I use the training scripy trainer.py you provided. And only changes the training epochs to 400. The initial learning rate is 0.1, and learning rate decays (* 0.1) at epoch 120, 240, 360. The weigth decay is 1e-4. But the accuracy I get is only 91.13. And the accuracy of uniform-1X is 90.58. This is a little bit not in accordance with the results in the paper. So Could you please release your training scripy. I will be really appreciate for it.
Thanks for the great work! It is mentioned in the paper that the hyperparameters setup follows that in Half-wave Gaussian Quantization. That paper mentioned that the VGG-Small is without two fully connected layers, while there is a
nn.Linear(self.in_planes, 1024, bias=False)
in the classifier. And the size
1024
does not match the number of classes in CIFAR 10, which is 10, obviously. I am not sure if this will affect the final performance, appreciate it if you can clarify a little bit? Thanks again.