Question about the difference between VoxFormer-S and VoxFormer-T

NVlabs / VoxFormer

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]

Other

1.07k stars 87 forks source link

Question about the difference between VoxFormer-S and VoxFormer-T #14

Closed hly2990 closed 1 year ago

hly2990 commented 1 year ago

Hi~ I'd like to know the difference between VoxFormer-S and VoxFormer-T. Their model sizes look identical. What is the purpose of a sequential frame？

Abde951 commented 1 year ago

VoxFormer-S uses only one image to predict the depth map which will be less accurate than the one generated in VoxFormer-T where they use 5 images (current entries and 4 previous entries). An accurate depth map means accurate voxel queries, which will obviously impact model performance. And it does not depend on the size of the model, only on the accuracy of the 3D occupancy grid (voxel queries).

RoboticsYimingLi commented 1 year ago

The difference lies in stage-2 rather than stage-1. VoxFormer-S only interacts with the current frame using voxel queries, while VoxFormer-T interacts with current and previous frames.