Short summary: Predict the distribution over multiple possible future paths of people as they move through various visual scenes.
Details
Method: Supervised deep neural networks
Input: The ground truth location history, and a set of video frames, which are preprocessed by a semantic segmentation model.
Output: Predicted locations of a single agent for multiple steps into the future.
"Forking Paths" dataset
First 3D simulation dataset that is reconstructed from real-world scenarios complemented with a variety of human trajectory continuations for multi-future person trajectory prediction.
New model "Multiverse"
The coarse location decoder's output: Heatmap over the 2D grid
The fine location decoder's output: Vector offset within each grid cell.
They use diverse beam search strategy to select the top K trajectory as inference
SOTA on VIRAT/ActEV benchmark
Comment: This paper does not consider human to human interaction explicitly but they input semantic image in addition to location history.
Paper: http://openaccess.thecvf.com/content_CVPR_2020/papers/Liang_The_Garden_of_Forking_Paths_Towards_Multi-Future_Trajectory_Prediction_CVPR_2020_paper.pdf
Short summary: Predict the distribution over multiple possible future paths of people as they move through various visual scenes.
Details
Comment: This paper does not consider human to human interaction explicitly but they input semantic image in addition to location history.
GitHub : https://github.com/JunweiLiang/Multiverse Project Page: https://next.cs.cmu.edu/multiverse/