abduallahmohamed / Social-STGCNN

Code for "Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction" CVPR 2020
MIT License
483 stars 141 forks source link

Questions about ADE and FDE #31

Closed d-zh closed 4 years ago

d-zh commented 4 years ago

Hi, I have read some issues about these, but I am also confused about the calculations of ADE and FDE in STGCNN. In test.py, I find these codes:

for n in range(num_of_objs):
        ade_bigls.append(min(ade_ls[n]))
        fde_bigls.append(min(fde_ls[n]))

I can't figure out why STGCNN picks out the minimum of the results instead of average. I think average may be more convincing. Thank you very much!

abduallahmohamed commented 4 years ago

Hi @d-zh,

Thanks for asking this. I will add a little bit of history on this.

How to evaluate a predicted distribution agains a ground truth point? e.g a predicted bi-variate Gaussian distribution against a n (x,y) point? One might think of using Mahalanobis distance for this which is a valid solution. So, what about an arbitrary distribution such GAN based method? A solution might to the use of a mixture of Gaussians to fit the predictions coming from the GAN with the need to detect an optimal number of mixtures. Then using an advanced Mahalanobis distance metric(Their is a paper publish in 1999 about this but can't recall it exactly) to quantify the distance. Then it comes to normal methods that predicts the point directly which a normal L2 metric is enough. So, to summarize we have 3 cases with respect to ground truth point. 1) models that predicts a point = L2 is enough, 2) models that predicts a Gaussian distribution = Mahalanobis distance is enough and 3) Models that predicts an arbitrary distribution such as GANs in which you need a mixture of Gaussians with optimal number of mixtures to fit it then advanced Mahalanobis distance to measure. The issue is no common metric is available to compare works in these 3 categories for example: SR-LSTM is 1), social-LSTM is 2) Social-STGCNN(ours) & Social-GAN is 3). The historic solution was to use the minimum of 20 samples which was introduced by Social-LSTM. Later on, all works followed the same evaluation method and the same settings. Thus the reported numbers all uses the same rule. Why the average solution is not fit because in case 2) the solution is multi-modal, aka taking the average is unfair.

I hope this elaborates on this point, I'm also open to any work that is willing to develop a common metric for this issue.

d-zh commented 4 years ago

Thanks for reply. It's exhaustive and clear.

But I have another question about the metric in STGCNN. For some applications, such as autonomous driving and surveillance systems (mentioned in the paper), actually, we don't know the ground truth in advance, so the minimum cannot be calculated. In this common scenario, all possible trajectories (e.g. 20 samples) inferred by network are equal, so I think they should all participate in the calculations of ADE and FDE.

abduallahmohamed commented 4 years ago

If you don't have the ground truth then using both ADE and FDE is pointless. In this case you might want to measure the entropy somehow or use an expert eye to evaluate the algorithm performance. On another hand, the output of social-stgcnn is a distribution, which is a component right before path planning in self-driving cars pipeline. You can find similar works from Uber ATG and Waymo where they predict a distribution, multi-modal in case of cars.

d-zh commented 4 years ago

Now I have a deeper understanding of this. Thanks again for your reply.