No difference between GradCAM and XGradCAM

lars-nieradzik commented 1 year ago

Bug description

No matter the image, there is no difference between the two CAMs.

Code snippet to reproduce the bug

from torchvision.io.image import read_image
from torchvision.transforms.functional import normalize, resize, to_pil_image
from torchvision.models import resnet18
from torchcam.methods import XGradCAM, GradCAM

model = resnet18(pretrained=True).eval()
# Get your input
img = read_image("n02992211_cello.jpg")
# Preprocess it for your chosen model
input_tensor = normalize(resize(img, (224, 224)) / 255., [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

with XGradCAM(model) as cam_extractor:
  # Preprocess your data and feed it to the model
  out = model(input_tensor.unsqueeze(0))
  # Retrieve the CAM by passing the class index and the model output
  activation_map = cam_extractor(out.squeeze(0).argmax().item(), out)

with GradCAM(model) as cam_extractor:
  # Preprocess your data and feed it to the model
  out = model(input_tensor.unsqueeze(0))
  # Retrieve the CAM by passing the class index and the model output
  activation_map2 = cam_extractor(out.squeeze(0).argmax().item(), out)

print(torch.mean((activation_map[0] - activation_map2[0])**2))

Error traceback

tensor(2.5343e-16)

Environment

pip install from master branch.

frgfm commented 1 year ago

Hey @lars-nieradzik :wave:

Thanks for notifying me! I think you are correct actually. In the paper, I used equations 7 and 8. But I didn't take equation 5 into account correctly.

I'm not entirely sure how to do this here! Perhaps the author @Fu0511 could enlighten us?

Here is how Grad CAM weight is computed:

grad.flatten(2).mean(-1)

and here is how XGradCAM weight is computed:

(grad * act).flatten(2).sum(-1) / act.flatten(2).sum(-1).add(eps)

But how do we integrate equation 5 in there?

Cheers!

Fu0511 commented 1 year ago

Thank you for your attention to our work. Since resnet18 is a GAP-CNN whose penultimate layer is a global average pooling (GAP) layer, it is natural that Grad-CAM and XGrad-CAM achieve the same performance on resnet18. The reason has been described in our paper, it can be proved that Grad-CAM and XGrad-CAM are exactly the same in the case of GAP-CNNs (refer to Appendix C for the detailed proof). For the visualization of other models such as VGG16, our XGrad-CAM outperforms the Grad-CAM. The significance of our paper is providing a clear mathematical explanation to fill the gap in interpretability for CAM visualization methods. Hope that explanation helps.

frgfm commented 1 year ago

Thanks a lot @Fu0511 :pray: One question though: in the paper, the XGradCAM weight expression above is implementing equation 7 & 8, assuming that Sc is the output score for class c. Is that correct? (cf. your first paragraph in section 3.1) Is that's the case, I think the implementation stands

Now I remember yes, and I tried the same snippet @lars-nieradzik with vgg16, the difference is marginally bigger. So it looks like this makes sense.

Fu0511 commented 1 year ago

Yes, that is right. The weight of each feature map in XGrad-CAM is defined as a weighted average of its gradients by solving an optimization problem, while Grad-CAM uses the arithmetic average.

frgfm commented 1 year ago

Thanks for the clarifications :pray: Closing the issue then, feel free to reopen if you encounter a problem @lars-nieradzik :ok_hand:

frgfm / torch-cam