brjathu / deepcaps

Official Implementation of "DeepCaps: Going Deeper with Capsule Networks" paper (CVPR 2019).
MIT License
151 stars 48 forks source link

Hyperparameters for CIFAR10 #3

Closed chjw1475 closed 5 years ago

chjw1475 commented 5 years ago

Hi. Thank you for uploading your code. Would you let me know the hyperparameters for the CIFAR10?

I tried

model, eval_model = DeepCapsNet(input_shape=x_train.shape[1:], n_class=y_train.shape[1], routings=args.routings) # for 64*64 batch_size = 32

to train on single 1080 ti GPU. But it does not seem to converge. Thank you.

brjathu commented 5 years ago

Hi, all the hyperparameters are set in the args class for CIFAR10, could you please tell what was your test accuracy at the end?

chjw1475 commented 5 years ago

When I tried batch_size = 32, the test accuracy was around 0.1 at the Epoch 36. I think it was because I did not adjusted the learning rate according to the batch size.

The batch_size in the args class is 256. But when I try to train on a single 1080 ti GPU, the out-of-memory issue occurs. So, the CIFAR-10 model cannot be trained on a single GPU, is it right? Instead, I tried batch_size = 128 and the test accuracy is around 0.897 at the epoch 34.

brjathu commented 5 years ago

You don't necessarily have to keep batch size 256. having 128 is fine. if you train till 100 epoch and after that train with hard_loss, you can converge to 91%.

We use 256, with four V100 GPUs in parallel, to speed up the training, but you can still use a single GPU to train the model.

chjw1475 commented 5 years ago

Yes. I got 90.6 % test accuracy. Thanks a lot!

brjathu commented 5 years ago

Great. Thank you very much.