Inquiry About Generating CAM Images with LViT

Studentpengyu commented 3 months ago

Dear Zihan,

I hope this message finds you well.

I have been following your work with great interest. Currently, I am attempting to replicate the interpretability Study in your paper However, I have encountered some challenges.

I am using the package 'pytorch_grad_cam' with the following code snippet:

'cam' requires the model's input to be a single tensor, whereas the LViT model requires two tensors (image, text) as input. I would like to ask how you generated the CAM figures in your work. Could you kindly provide some guidance or share the relevant code?

Thank you very much for your time and assistance.

Best regards, Pengyu Zhao

HUANGLIZI commented 3 months ago

You should change the source code of GradCAM. And make it adaptable for two inputs.

Studentpengyu commented 3 months ago

Thank you for your response. I have obtained the expected results for the LViT model and cited it correctly. I appreciate all your help along the way.

HUANGLIZI / LViT

Inquiry About Generating CAM Images with LViT #45