Hyungjun-K1m / BinaryDuo

Torch-7 implementation of BinaryDuo (ICLR 2020).
9 stars 1 forks source link

if a ternary model with half model size can perform obviously better than the normal binary one #1

Open liyue2ppy opened 4 years ago

liyue2ppy commented 4 years ago

Hello! I'm so interested in this work and tried to reproduce it using python and pytorch.

When I run my code on CIFAR100 I found the coupled ternary structure performed poorly than the corresponding binary one. I use the resnet18 with resnet_sc_coupled architecture(64-90-180-360), the activition function is transformed from yours, and weight binarization function is adopted the same as BiReal-Net that use a sign function in forward and clamp function in backward.

The accuracy of coupled ternary network and its corresponding binary one (64-128-256-512) are 66.24 vs 66.48,which is not consistent with the phenomenon in paper that "using ternary activation and cutting the model size in half improved the performance of the network".

I am confused with this result , and want to know if a ternary activation with half model size can perform obviously better than the normal binary one, or does the finetune process based on the decoupled one is very important ?

Hyungjun-K1m commented 4 years ago

Hi,

Thank you for your interest in our work!

First of all, it's a good idea to test with the same model architecture on the same dataset to reproduce the numbers on our paper to see if there are any miss-implemented modules. Also, according to the original ResNet paper (https://arxiv.org/abs/1512.03385), ResNet-18 is for ImageNet sized datasets and they used ResNet-20,32,44,56, etc. for CIFAR dataset. If you need to use ResNet-18 on CIFAR dataset, make sure to keep the size of the spatial domain of the feature map in early layers.

Also, we cannot say that coupled ternary model is always better than baseline binary model because it is so difficult to measure the performance of a model. Note that the coupled ternary model has half number of parameters compared to the baseline binary model. And YES, the fine-tuning process is important to improve the model performance. In our experiment (section 6.1), the coupled ternary model achieved a 0.62% improvement, and fine-tuning it resulted in an additional 0.75% improvement.

I hope my answer helps your understanding. Thanks!