Thank your for your work. when I read the code and the paper, I find out that there is some differences between the figure 5 in the paper with the code MotionTransformerDecoder class.
In the paper , the four folds are added position encoding and MLP, then they are catted togethoer and sent to transformer.
But in the code ,the four folds are divided into dynamic_query_embed and static_intention_embed. How do they correspond?the in_query_fuser layer seems not appear in the paper also.
How can I correspond the code with paper about the motionformer?
dynamic_query_embed = self.dynamic_embed_fuser(torch.cat(
[agent_level_embedding, scene_level_offset_embedding, scene_level_ego_embedding], dim=-1)) # the predicted goal point from the previous layer
# fuse static and dynamic intention embedding
query_embed_intention = self.static_dynamic_fuser(torch.cat(
[static_intention_embed, dynamic_query_embed], dim=-1)) # (B, A, P, D)
# fuse intention embedding with query embedding
query_embed = self.in_query_fuser(torch.cat([query_embed, query_embed_intention], dim=-1))
Thank your for your work. when I read the code and the paper, I find out that there is some differences between the figure 5 in the paper with the code MotionTransformerDecoder class.
In the paper , the four folds are added position encoding and MLP, then they are catted togethoer and sent to transformer.
But in the code ,the four folds are divided into dynamic_query_embed and static_intention_embed. How do they correspond?the in_query_fuser layer seems not appear in the paper also.
How can I correspond the code with paper about the motionformer?
dynamic_query_embed = self.dynamic_embed_fuser(torch.cat( [agent_level_embedding, scene_level_offset_embedding, scene_level_ego_embedding], dim=-1)) # the predicted goal point from the previous layer