autonomousvision / transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
MIT License
1.12k stars 186 forks source link

Need some help about understanding the code #204

Closed Oliverwang11 closed 6 months ago

Oliverwang11 commented 6 months ago

Hi,

Thanks again for your contribution!

After read the paper, I took a look at the code, especially for the GPT class, but I found something I am a little bit confused.

  1. In the paper, it says the the input image is down-sampled to 522xC and LiDAR to 88C. If I understand correctly, for inference batch size, in your comments B, should be one? And why the input size of image is B4*seq_len, C, H, W in your comments, where does the number 4 come from? Maybe I misunderstood some thing.

def forward(self, image_tensor, lidar_tensor, velocity): """ Args: image_tensor (tensor): B*4*seq_len, C, H, W lidar_tensor (tensor): B*seq_len, C, H, W velocity (tensor): ego-velocity """

Best wishes! Thanks again!

Kait0 commented 6 months ago

I think it's just a typo.