Closed Studentpengyu closed 3 months ago
You should change the source code of GradCAM. And make it adaptable for two inputs.
Thank you for your response. I have obtained the expected results for the LViT model and cited it correctly. I appreciate all your help along the way.
Dear Zihan,
I hope this message finds you well.
I have been following your work with great interest. Currently, I am attempting to replicate the interpretability Study in your paper However, I have encountered some challenges.
I am using the package 'pytorch_grad_cam' with the following code snippet:
'cam' requires the model's input to be a single tensor, whereas the LViT model requires two tensors (image, text) as input. I would like to ask how you generated the CAM figures in your work. Could you kindly provide some guidance or share the relevant code?
Thank you very much for your time and assistance.
Best regards, Pengyu Zhao