Hello! I encountered some confusion while reading the code, for example, in the temporal cross-attention part of the future decoder, why are the future BEV queries coordinates in historical frames calculated as reference points? The key and value are current bev feature,right?
Hello! I encountered some confusion while reading the code, for example, in the temporal cross-attention part of the future decoder, why are the future BEV queries coordinates in historical frames calculated as reference points? The key and value are current bev feature,right?