TimDettmers / sparse_learning

Sparse learning library and sparse momentum resources.
MIT License
377 stars 45 forks source link

performance loss a little big #10

Open wuzhiyang2016 opened 4 years ago

wuzhiyang2016 commented 4 years ago

I have train alexnet-s model in cifar10 for 100 epochs:

### dense model performance:

accuracy: 8322/10000 (83.220%) conv layers sparsity: layer features.0.weight sparsity: 0.002812213039485756 layer features.3.weight sparsity: 0.017063802083333357 layer features.6.weight sparsity: 0.01662303783275465 layer features.9.weight sparsity: 0.011704433111496937 layer features.12.weight sparsity: 0.25820583767361116 layer classifier.0.weight sparsity: 0.004718780517578125 layer classifier.3.weight sparsity: 0.004614830017089844 layer classifier.6.weight sparsity: 0.0021484375000000444

### sparse model performance: accuracy: 8131/10000 (81.310%) conv layers sparsity: layer features.0.weight sparsity: 0.11059458218549123 layer features.3.weight sparsity: 0.9346875 layer features.6.weight sparsity: 0.9564841941550926 layer features.9.weight sparsity: 0.9704597378954475 layer features.12.weight sparsity: 0.2798902723524306 layer classifier.0.weight sparsity: 0.8953094482421875 layer classifier.3.weight sparsity: 0.9820785522460938 layer classifier.6.weight sparsity: 0.046191406249999956

accuracy down nearly 2 percent

so, is there something wrong with me ?

TimDettmers commented 4 years ago

You need to train both networks a bit longer (250 epochs) to get better performance, but the gap between dense and sparse performance will remain if you use the default 5% weights. In the new version of our paper (will be released on Monday), we investigate how much weights do we need to achieve dense performance for each network on CIFAR-10. AlexNet-s needs the most weights among all networks: 50%. So statistically, with 50% weights, you should reach dense performance. You can get good results with less weights and worse results with more weights, but the sparse mean will be in the same 95% confidence interval as the dense mean if you use 50% weights. You should still see a speedup of about 1.3x for dense convolution and 3.0x for sparse convolution with 50% weights.