Evaluation incorrect? - Githubissues

jakubmicorek commented 3 years ago

Isn't the evaluation calculation wrong?
Considering Ped2, why are the scores normalized between 0 and 1 for each of the 12 video clips individually? This isn't correct as the camera view is the same in all of the clips.

feiyuhuahuo commented 3 years ago

I follow the same setting as the original implementation. Here's a same question. https://github.com/StevenLiuWen/ano_pred_cvpr2018/issues/27 From my point of view, it is ok to normalize the 12 videos' scores uniformly.

jakubmicorek commented 3 years ago

Thank you for pointing me to the other repo.

However, I would still argue this isn't correct for evaluating such a system. Suppose you want to use such a system in production at the campus where the videos were recorded at. You would need someone to adapt the threshold all the time depending on what's happening.

Take for example three video sequences where: 1) no people (background only) 2) a few people 3) a lot people walk through the frame.

Clips of type 1. will probably produce a smaller error than 2. and clips of type 2. will probably produce a smaller error than 3. for normal behavior because there is more foreground and motion which needs to be reconstructed. As in each of the 12 test videos anomalies occur and each of the clips are normalized between 0 and 1 individually, the overall larger errors of type 3. are pushed down to regions of type 1. and 2.

feiyuhuahuo commented 3 years ago

Yes, I agree with you. In real production environment, an uniform threshold should be set carefully for different events which belong to the same occasion. This model still has some limits.

feiyuhuahuo / Anomaly_Prediction

Evaluation incorrect? #23