I noticed that in the provided BEVDepth configs, key_idxes is set to [-1] (s. bev_depth_lss_r50_256x704_128x128_24e_2key.py). Does that mean that by default, these models will use a temporal fusion with one previous frame?
In the paper I understand that you only use multi-frame fusion for test benchmark results, am I missing something?
Hi,
I noticed that in the provided BEVDepth configs, key_idxes is set to [-1] (s. bev_depth_lss_r50_256x704_128x128_24e_2key.py). Does that mean that by default, these models will use a temporal fusion with one previous frame? In the paper I understand that you only use multi-frame fusion for test benchmark results, am I missing something?