AI-Huang / AdderNet-tf

TensorFlow implementation for paper "AdderNet: Do We Really Need Multiplications in Deep Learning?"
BSD 3-Clause "New" or "Revised" License
5 stars 0 forks source link

No convergence - Training AdderNet with Cifar10 #1

Open shl-shawn opened 1 year ago

shl-shawn commented 1 year ago

Hi Kan,

Thank you for your brilliant work for writing AdderNet in Tensorflow.

I learned from your codes that you have trained ResNet20-V1 and achieved 92.16% accuracy in Cifar10.

I tried to train the model with the following command, but found that it did not converge at all. The training loss and accuracy at 100 epoch are same as that of the first 5 epoch, whose screenshots are attached below.

python train_addernet_cifar10.py --data_preprocessing "subtract_pixel_mean" --lr_schedule "cifar10_scheduler" --dataset "cifar10" --use_addernet --data_preprocessing subtract_pixel_mean --epochs 300 --batch_size 64

I would be appreciate it if you could let me know what changes I should make.

Best Regard, Shawn

Epoch 1-5: 截屏2022-12-15 00 32 48

Epoch 100:

截屏2022-12-15 00 36 44

AI-Huang commented 1 year ago

Hi Shawn,

Thanks for your trial on my codes. Actually, pls check the README.md, which states that this code repository has NOT been tested yet.

This also means your output results are the same as mine, which I haven't uploaded and I am still fixing on.

The reasons I guess for the model not getting convergence could be:

  1. TensorFlow's SGDW optimizer is not the same as PyTorch's SGD with momentum;
  2. My implementation of the Adder2D layer especially its gradient function has some bugs.

I'll try to figure out the problem when I have some time, thanks!