IntelLabs / distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Apache License 2.0
4.34k stars 800 forks source link

Thinning with different dataset input sizes leads to different accuracy #479

Closed yama514 closed 4 years ago

yama514 commented 4 years ago

When doing filter removal (resnet18), different dataset input sizes give different P/Rs. I tried with dataset: 'imagenet' and my own dataset with input size (1,3,288,512) respectively, the thinned model with my dataset input gets higher P/R. It seems the dummy input does more than finding the data-dependencies #416. Could you explain more about it? Thanks.

nzmora commented 4 years ago

Hi @yama514,

Can you explain what you mean by P/R? Thanks Neta

yama514 commented 4 years ago

Sorry for the confusion. I am testing a resnet18 based detection network, P/R is the coco eval precision and recall on my test set.

nzmora commented 4 years ago

Hi @yama514 ,

I might be missing more information, because it seems obvious to me that different datasets will produce different P/Rs, regardless of any sparsity and thinning. The two datasets have different number of objects, classes and examples distributions. You know the datasets you've used:

  1. when you don't perform any pruning (sparsification) and thinning - how do your two datasets perform?
  2. If you only perform pruning (sparsification only - so you have channels with zeros, but you didn't delete physically them using thinning) - how do your two datasets perform?

Cheers, Neta

yama514 commented 4 years ago

Hey Neta, thanks for the reply. I did the training (pruning) and testing all on my own same dataset. Only in the thinning process when setting net_thinner in yaml, I set dataset field to be "imagenet" for one experiment, and manually set the input size to be my dataset image shape (1,3,288,512) in code for the second experiment. I thought the size of the dummy input is only used for checking data flow. However, the two thinning experiments show different precisions and recalls on the same testset. Hope this helps describe my question. Thanks!

nzmora commented 4 years ago

That's interesting. You can see in the code (search for dataset) that the dataset is used only for creating a SummaryGraph (code).

A SummaryGraph is documented in the code, and in this issue. In short, the dummy_input is fed into a PyTorch model for tracing. The Pytorch trace is a representation of the forward-graph generated by our dummy_input. We then convert this graph into a distiller.SummaryGraph representation. The thinning process uses this distiller.SummaryGraph representation to determine dependencies.

Your input - (1,3,288,512)- and ImageNet - (1,3,224,224)- have the same number of input channels (3) so the feature-extraction part of ResNet18 should not be affected (i.e. the Convolution layers don't require reconfiguration). The average-pooling module (nn.AdaptiveAvgPool2d((1, 1))) also helps maintain independence from input size.

Because the number of channels in all the layers are independent of the input size (see above), the thinning should behave the same. You can create different summaries of the two models and compare their structural characteristics. Maybe you'll see a difference that explains the different results, but I don't think so.
And this leads me to suspect that the difference is in how you perform the pre-processing of each of these datasets. Cheers, Neta

yama514 commented 4 years ago

Thank you for the detailed explanation, Neta. Since the thinning is independent of the input size in this case, I will check the data, retrain the models, and compare the summaries.