Closed yama514 closed 4 years ago
Hi @yama514,
Can you explain what you mean by P/R? Thanks Neta
Sorry for the confusion. I am testing a resnet18 based detection network, P/R is the coco eval precision and recall on my test set.
Hi @yama514 ,
I might be missing more information, because it seems obvious to me that different datasets will produce different P/Rs, regardless of any sparsity and thinning. The two datasets have different number of objects, classes and examples distributions. You know the datasets you've used:
Cheers, Neta
Hey Neta, thanks for the reply. I did the training (pruning) and testing all on my own same dataset. Only in the thinning process when setting net_thinner in yaml, I set dataset field to be "imagenet" for one experiment, and manually set the input size to be my dataset image shape (1,3,288,512) in code for the second experiment. I thought the size of the dummy input is only used for checking data flow. However, the two thinning experiments show different precisions and recalls on the same testset. Hope this helps describe my question. Thanks!
That's interesting. You can see in the code (search for dataset
) that the dataset
is used only for creating a SummaryGraph (code).
A SummaryGraph is documented in the code, and in this issue.
In short, the dummy_input
is fed into a PyTorch model for tracing. The Pytorch trace is a representation of the forward-graph generated by our dummy_input
. We then convert this graph into a distiller.SummaryGraph
representation. The thinning process uses this distiller.SummaryGraph
representation to determine dependencies.
Your input - (1,3,288,512)- and ImageNet - (1,3,224,224)- have the same number of input channels (3) so the feature-extraction part of ResNet18 should not be affected (i.e. the Convolution layers don't require reconfiguration). The average-pooling module (nn.AdaptiveAvgPool2d((1, 1))
) also helps maintain independence from input size.
Because the number of channels in all the layers are independent of the input size (see above), the thinning should behave the same. You can create different summaries of the two models and compare their structural characteristics. Maybe you'll see a difference that explains the different results, but I don't think so.
And this leads me to suspect that the difference is in how you perform the pre-processing of each of these datasets.
Cheers,
Neta
Thank you for the detailed explanation, Neta. Since the thinning is independent of the input size in this case, I will check the data, retrain the models, and compare the summaries.
When doing filter removal (resnet18), different dataset input sizes give different P/Rs. I tried with dataset: 'imagenet' and my own dataset with input size (1,3,288,512) respectively, the thinned model with my dataset input gets higher P/R. It seems the dummy input does more than finding the data-dependencies #416. Could you explain more about it? Thanks.