f1yfisher / DriveDreamer2

103 stars 3 forks source link

HD map generated conditioned on traj or the other way around? #3

Closed dichencd closed 2 days ago

dichencd commented 1 week ago

Hi, thanks for the great work! I have one question regarding the BEV trajectory and hdmap generation. In the paper, the BEV trajectories are generated first, and hdmap is then generated using conditional diffusion model. I wonder whether there are any reasons behind this choice. Why not generate hdmap first and then generate agents adhering to road constraints? Thank you very much!

f1yfisher commented 6 days ago

Thank you very much for your interest in our project!

Theoretically, this approach is available since our goal is to decouple the HDMap from the agent trajectories. However, generating agent trajectories based on road structures presents several challenges, primarily related to our motivation and implementation difficulties:

  1. Regarding our motivation, we are more focused on trajectories in corner cases, with less emphasis on the road structure itself; it is sufficient for the road structure to meet basic requirements.
  2. Decoding the vehicle's position at each moment from the BEV trajectory map is quite challenging. In contrast, the road structure is static in three-dimensional space, making it easier to process. Determining the endpoints of the generated agent trajectories can be problematic, as it is difficult to ascertain which point is the starting point and which is the endpoint. While gradient colors could be used to indicate motion direction, this adds complexity to both model training and decoding. If two agents' trajectories intersect, it becomes challenging to accurately extract results during processing.
dichencd commented 5 days ago

Thank you very much for your reply!

I wonder whether the implementation difficulties mean that doing conditional diffusion p(x|y) is hard when the condition y is complicated. Or, in other words, diffusion models may not generalize well when the conditional information is too complex (I assume the trajectories are simpler conditional information compared to the hd-map).

f1yfisher commented 4 days ago

Yes, in addition to this issue, another challenge is that decoding the agent trajectory sequences from the BEV trajectory map is difficult.

dichencd commented 2 days ago

Thanks!