Closed jakubmicorek closed 3 years ago
I follow the same setting as the original implementation. Here's a same question. https://github.com/StevenLiuWen/ano_pred_cvpr2018/issues/27 From my point of view, it is ok to normalize the 12 videos' scores uniformly.
Thank you for pointing me to the other repo.
However, I would still argue this isn't correct for evaluating such a system. Suppose you want to use such a system in production at the campus where the videos were recorded at. You would need someone to adapt the threshold all the time depending on what's happening.
Take for example three video sequences where: 1) no people (background only) 2) a few people 3) a lot people walk through the frame.
Clips of type 1. will probably produce a smaller error than 2. and clips of type 2. will probably produce a smaller error than 3. for normal behavior because there is more foreground and motion which needs to be reconstructed. As in each of the 12 test videos anomalies occur and each of the clips are normalized between 0 and 1 individually, the overall larger errors of type 3. are pushed down to regions of type 1. and 2.
Yes, I agree with you. In real production environment, an uniform threshold should be set carefully for different events which belong to the same occasion. This model still has some limits.
Isn't the evaluation calculation wrong?
Considering Ped2, why are the scores normalized between 0 and 1 for each of the 12 video clips individually? This isn't correct as the camera view is the same in all of the clips.