Questions about evaluation method and network initialization

autonomousvision / transfuser

[PAMI'23] TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving; [CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

MIT License

1.11k stars 185 forks source link

Questions about evaluation method and network initialization #58

Closed gitped closed 2 years ago

gitped commented 2 years ago

Hello, 1.- In your transfuser paper you report the mean and std dev over 9 runs of each method (3 training seeds, each seed evaluated 3 times). Does this mean that you changed the REPETITIONS value in the run_evaluation.sh script to REPETITIONS=3 or did you evaluate each model 3 separate times with REPETITIONS=1?

2.- I understand that each model is generated with a random training seed, leading to some variance in the results. If I wanted to make more reproducible and deterministic models, in what part of the code can I set fixed training seeds or modify the network initialization method?

ap229997 commented 2 years ago

We evaluated 3 separate times with REPETITIONS=1 to parallelize it across multiple GPUs for faster evaluation, REPETITIONS=3 is also fine.

You can modify train.py of the desired model to include the following in the very beginning (you can set the seed value as per your choice):

def seed(seed=42):
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

gitped commented 2 years ago

Thank you. Also, do you happen to know what are the time complexities of CILRS, AIM, and the 3 fusion models you test?

ap229997 commented 2 years ago

What exactly do you mean by time complexities (train time, eval time, or algorithmic complexity)?

gitped commented 2 years ago

I mean their algorithmic complexities in big O notation, such as O(n^2).

ap229997 commented 2 years ago

Let N be the number of tokens (for HxW grid data like images or LiDAR BEV, N = H*W)

Transfuser: O(N²) since quadratic attention is used (new transformer models like Linformer have better complexity and can be used as well)
Geometric fusion: O(N) since the image-LiDAR correspondences can be precomputed (also depends on how tensor indexing is implemented in PyTorch)
Late fusion: O(N)
CILRS, AIM: O(N) with a smaller constant factor than fusion methods