Cogito2012 / UString

[ACM MM 2020] Uncertainty-based Traffic Accident Anticipation
MIT License
63 stars 18 forks source link

What are all these green bounding boxes? #4

Closed monjurulkarim closed 3 years ago

monjurulkarim commented 3 years ago

In the result video what are all these green bounding boxes? Aren't these bounding boxes should only be on detected objects? I saw many boxes in empty space where there are no objects. Are these false detections?

Cogito2012 commented 3 years ago

@monjurulkarim They are top-K region proposals according to the detection scores. In this work, we sampled top-K bounding boxes as region proposals where K is set to 19 by default. It follows the same protocol as DSA-RNN (accv'16). If the bounding boxes are only on detected objects, the number of boxes will not be fixed, which is not practical for model learning.

monjurulkarim commented 3 years ago

@Cogito2012 , thank you for your reply. I understood what you did. Could you please just give me a little more explanation what did you mean by, "If the bounding boxes are only on detected objects, the number of boxes will not be fixed."? Did you mean that, each frame we need K number of fixed proposals?

Cogito2012 commented 3 years ago

@monjurulkarim Right, in my paper, I proposed to use GCN to learn the object relations for accident anticipation. If the number of bounding boxes is not fixed, the graph structure will be dynamically changing due to increasing/decreasing nodes. In this case, graph convolution is not applicable.

monjurulkarim commented 3 years ago

@Cogito2012 I got it. thank you.

monjurulkarim commented 3 years ago

@Cogito2012 In the code where can I find the following equation from your paper:

image

Cogito2012 commented 3 years ago

@monjurulkarim It's in line 304 at DataLoader.py file. As our graph is fully connected with this Eq., I just implemented it in data loading.

monjurulkarim commented 3 years ago

@Cogito2012 Can you please kindly explain what are these two numbers indicate?

image

Cogito2012 commented 3 years ago

@monjurulkarim The det entry of the feature file is the object detection results for the whole video. In the last dimension, it has 6 columns which are [x1, y1, x2, y2, score, class_label]. You may refer to demo.py at line 65 to see how they are obtained.

To comute the graph edge, we only need the bounding box coordinates (the first 4 columns), such that detections[i, :, :4] is used as input.

monjurulkarim commented 3 years ago

@Cogito2012 Thank you for the clarification!

monjurulkarim commented 3 years ago

@Cogito2012 I just checked inside the feature file. Why the class score is always zero in all frames?

Cogito2012 commented 3 years ago

@monjurulkarim It's just because the threshold of object detection output is small in order to get enough bounding boxes. But this class score is not used in the algorithm. You can just ignore it.

monjurulkarim commented 3 years ago

@Cogito2012 Thank you for the reply. If the score of all the objects are zero then how did you select 19 objects in each frame?

Cogito2012 commented 3 years ago

@monjurulkarim You can refer to our bbox_sampling function in line 46 in demo.py. With the detected bounding boxes, we do not rely on the score but use a sampling strategy here. We randomly select the rest of boxes from top-N entries of the detected boxes, if the number of detected boxes are less than 19.

For your concern, if you find a frame in which all objects have zero scores, that means the detection performance is not good enough for that frame. Absolutely, you can try your best to train a good object detector for your dataset.

monjurulkarim commented 2 years ago

@Cogito2012 Hello, I hope you are doing well! I came up with a question. My question is why did you select exactly 19 candidate objects? Why not any other number (eg.10 or 30)?

Cogito2012 commented 2 years ago

@monjurulkarim This number is simply following the prior work DSA-RNN , where 19 objects + 1 full-frame are used as the feature representation of each frame. I think there is no special reason to set this number but just an empirical value.

monjurulkarim commented 2 years ago

@Cogito2012 Thank you!