OpenDriveLab / TCP

[NeurIPS 2022] Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline.
Apache License 2.0
309 stars 40 forks source link

Figures 2 and 3 of the supplementary material #43

Open mANDm1412 opened 11 months ago

mANDm1412 commented 11 months ago

Hi, I would greatly appreciate it if you could share the code that you used for the visualization in figures 2 and 3 of the supplementary material? (trajectory-guided attention maps, GradCam and EigenCam) Thank you very much in advance!

penghao-wu commented 11 months ago

Hi, I could provide some information on how to get those visualizations. For eigen-cam, I use the following function on the 2D feature map from the vision-backbone and then visualize it with the function show_cam_on_image from pytorch-grad-cam

def get_2d_projection(activation_batch):
    # TBD: use pytorch batch svd implementation
    activation_batch[np.isnan(activation_batch)] = 0
    projections = []
    for activations in activation_batch:
        reshaped_activations = (activations).reshape(
            activations.shape[0], -1).transpose()
        # Centering before the SVD seems to be important here,
        # Otherwise the image returned is negative
        reshaped_activations = reshaped_activations - \
            reshaped_activations.mean(axis=0)
        U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)
        projection = reshaped_activations @ VT[0, :]
        projection = projection.reshape(activations.shape[1:])
        projection = np.abs(projection)
        max_v, min_v = np.max(projection), np.min(projection)
        if max_v != min_v:
            projection = (projection - min_v) / (max_v - min_v)

        projections.append(projection)
    return np.float32(projections)

For the Grad-cam, I use the grad-cam in pytorch-grad-cam with the customized Target. Note that the pytorch-grad-cam repo only supports single inputs, so I kept other inputs like velocity, and conditioning as the model’s attributes so you could use them during forward and updated them before calling grad-cam. Something like: model.velocity=velocity, gradcam(model, image).

class Target:
    def __init__(self, gt):
        self.dist_sup = Beta(gt['action_mu'].cuda(), gt['action_sigma'].cuda())

    def __call__(self, model_output):
        model_output = model_output.unsqueeze(0)
        mu = model_output[:, :2]
        sigma = model_output[:, 2:]
        dist_pred = Beta(mu, sigma)
        kl_div = torch.distributions.kl_divergence(self.dist_sup, dist_pred)
        return -1 *(torch.mean(kl_div[:, 0]) *0.5 + torch.mean(kl_div[:, 1]) *0.5)

For the visualization in Fig2, you can just resize and visualize the wp_att during model inference.

anantagrg commented 2 weeks ago

Hi @mANDm1412, were you able to reproduce the Fig 2 results of the supplementary material?