Open mANDm1412 opened 11 months ago
Hi, I could provide some information on how to get those visualizations.
For eigen-cam, I use the following function on the 2D feature map from the vision-backbone and then visualize it with the function show_cam_on_image
from pytorch-grad-cam
def get_2d_projection(activation_batch):
# TBD: use pytorch batch svd implementation
activation_batch[np.isnan(activation_batch)] = 0
projections = []
for activations in activation_batch:
reshaped_activations = (activations).reshape(
activations.shape[0], -1).transpose()
# Centering before the SVD seems to be important here,
# Otherwise the image returned is negative
reshaped_activations = reshaped_activations - \
reshaped_activations.mean(axis=0)
U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)
projection = reshaped_activations @ VT[0, :]
projection = projection.reshape(activations.shape[1:])
projection = np.abs(projection)
max_v, min_v = np.max(projection), np.min(projection)
if max_v != min_v:
projection = (projection - min_v) / (max_v - min_v)
projections.append(projection)
return np.float32(projections)
For the Grad-cam, I use the grad-cam in pytorch-grad-cam with the customized Target
. Note that the pytorch-grad-cam repo only supports single inputs, so I kept other inputs like velocity, and conditioning as the model’s attributes so you could use them during forward and updated them before calling grad-cam. Something like: model.velocity=velocity, gradcam(model, image).
class Target:
def __init__(self, gt):
self.dist_sup = Beta(gt['action_mu'].cuda(), gt['action_sigma'].cuda())
def __call__(self, model_output):
model_output = model_output.unsqueeze(0)
mu = model_output[:, :2]
sigma = model_output[:, 2:]
dist_pred = Beta(mu, sigma)
kl_div = torch.distributions.kl_divergence(self.dist_sup, dist_pred)
return -1 *(torch.mean(kl_div[:, 0]) *0.5 + torch.mean(kl_div[:, 1]) *0.5)
For the visualization in Fig2, you can just resize and visualize the wp_att
during model inference.
Hi @mANDm1412, were you able to reproduce the Fig 2 results of the supplementary material?
Hi, I would greatly appreciate it if you could share the code that you used for the visualization in figures 2 and 3 of the supplementary material? (trajectory-guided attention maps, GradCam and EigenCam) Thank you very much in advance!