clovaai / wsolevaluation

Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020)
MIT License
332 stars 55 forks source link

logic problem #40

Closed lingorX closed 3 years ago

lingorX commented 3 years ago

When generate cam, you use the ground truth label to get the channel weight of fully connected layer. But I think the predict class of model is the right choice.

lingorX commented 3 years ago

`https://github.com/clovaai/wsolevaluation/blob/master/inference.py

for images, targets, image_ids in self.loader:

        image_size = images.shape[2:]

        images = images.cuda()

        cams = t2n(self.model(images, targets, return_cam=True))`

`https://github.com/clovaai/wsolevaluation/blob/master/wsol/vgg.py

if return_cam:

        feature_map = x.detach().clone()

        cam_weights = self.fc.weight[labels]

        cams = (cam_weights.view(*feature_map.shape[:2], 1, 1) *feature_map).mean(1, keepdim=False)

        return cams`
coallaoh commented 3 years ago

Thank you for your comment.

I would say both choices are correct: (1) GT label or (2) the predicted label. The two choices lead to two different WSOL evaluation metrics: (1) GT-known localization accuracy and (2) localization accuracy.

In our paper, we advocate the use of GT-known localization accuracy for the following reason (excerpt from Section 4.1):

The localization accuracy [41] metric entangles classification and localization performances by counting the number of images where both tasks are performed correctly. We advocate the measurement of localization performance alone, as the goal of WSOL is to localize objects (§3.1) and not to classify images correctly. To this end, we only consider the score maps s_{ij} corresponding to the ground-truth classes in our analysis. Metrics based on such are commonly referred to as the GT-known metrics [25, 56, 57, 6].

So the code where the score map for the GT label is used is just as intended.

lingorX commented 3 years ago

thank you for your attention