agrimgupta92 / sgan

Code for "Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks", Gupta et al, CVPR 2018
MIT License
813 stars 261 forks source link

The constant velocity model outperforms sgan? #44

Open flclain opened 5 years ago

flclain commented 5 years ago

By this paper: The Simpler the Better: Constant Velocity for Pedestrian Motion Prediction. https://arxiv.org/abs/1903.07933

"Because S-GAN and SoPhie were evaluated by drawing 20 samples and taking the predicted trajectory with the minimum error for evaluation, we added an extended version OUR-S of our CVM for comparability. For OUR-S we add additional angular noise to its predicted direction, which we draw from N (0, σ 2 ) with σ = 25 ◦ and evaluate its error in the same fashion."

Then you can find out the constant velocity model outperforms sgan if we draw 20 samples.

Just like @AyeshaNirma said in https://github.com/agrimgupta92/sgan/issues/8 "Yours and your colleagues, own TrajNet challenge does not allow this kind of evaluation. Here is a quick test, take a training set, cluster 10 trajectories make a linear model (not even a regressor ), during test time, assign each individual the "best" trajectory. You will see how well Social GAN performs wrt a linear motion model. You will have your answer, if 20 is a small number or not."

I am wondering about whether it's fair or not to use this kind of evaluation. @agrimgupta92

RebornHugo commented 5 years ago

@flclain I the same question. The evaluation seems unfair. How do you think the constant velocity's results? Are the results reasonable?

TToTMooN commented 5 years ago

That paper only shows that the current evaluation metrics (ADE, FDE) are not good enough to show if the model captures the interaction and environment information. I don't think their other analysis. It's not about the method, it's about the evaluation.

Most of the time when no interaction happened, people just keep a constant speed and sampling lines within a range of directions and picking the best would reach a high performance under these metrics. And most of the test cases would fall in this case if we simply sample sequences from the dataset.

The key point we should conclude from that paper may be that we do need better metrics for interactive prediction to make sure a model only considers non-interactive conditions cannot reach a relatively high score.

AyeshaNirma commented 5 years ago

@TToTMooN This is not completely true. Paper quite clearly shows quantitatively and qualitatively that as of now, neural network base approaches have failed to learn a representation capable of forecasting linear or non-linear trajectories better than a linear velocity model. It has nothing to do with evaluation metric as of now. A linear model is a baseline, really and that too a weak baseline. If an approach with all the hyper-parameters and model complexity cannot outperform a linear baseline, then something needs to improve in NN based approaches, we need to think as a community. Secondly, and originally, the principal issue is not in predicting multiple tracks but rather how to chose the best track. Surely, using ground-truth is not a way to do. Imagine in an object detection pipeline, predicting N number of proposals per pixel on a feature map and using the ground-truth box to select the best during test time ? would that be a fair evaluation by any standard.