There are some solutions to compress the size of model
Scale convolution kernel size such as [1], [2], [3]. For example, you can try to use a 1x3 convolution followed 3x1 convolution layer to replace the 3x3 convolution layer.
Use global average pooling instead of fully-connected layer, which can effectively decrease the number of parameters.
Use some model compression methods to do quantization such as [4].
Use knowledge distilling methods[5].
I have tried to scale convolution and use global average pooling, which may lead to the accuracy drop 0.5~1%. If you can tolerate the accuracy dropped, you can try them, especially the global average pooling.
[1]. Jaderberg, Max, Andrea Vedaldi, and Andrew Zisserman. "Speeding up convolutional neural networks with low rank expansions." arXiv preprint arXiv:1405.3866, 2014.
[2]. Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[3]. Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360, 2016.
[4]. Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149, 2015.
[5]. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531, 2015.
There are some solutions to compress the size of model
I have tried to scale convolution and use global average pooling, which may lead to the accuracy drop 0.5~1%. If you can tolerate the accuracy dropped, you can try them, especially the global average pooling.
[1]. Jaderberg, Max, Andrea Vedaldi, and Andrew Zisserman. "Speeding up convolutional neural networks with low rank expansions." arXiv preprint arXiv:1405.3866, 2014. [2]. Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. [3]. Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360, 2016. [4]. Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149, 2015. [5]. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531, 2015.