Thanks for your work!
It seems that the scene_level_offset_embed is used to represent the tranj. position, thus when the rotation transform is performed, which coordinate system is it in? Meanwhile, why is that only translation transform is required for updating the scene_level_ego_embed across the modality?
Thanks for your work! It seems that the scene_level_offset_embed is used to represent the tranj. position, thus when the rotation transform is performed, which coordinate system is it in? Meanwhile, why is that only translation transform is required for updating the scene_level_ego_embed across the modality?