daniel-bogdoll / unknown_objects_roads

Repository of the paper Multimodal Detection of Unknown Objects on Roads for Autonomous Driving at IEEE SMC
https://arxiv.org/abs/2205.01414
25 stars 9 forks source link

2D anomalous detection #2

Closed luoyuchenmlcv closed 2 years ago

luoyuchenmlcv commented 2 years ago

Dear Daniel,

Great work! I am reading your paper recently, but I am a bit confused about the 2D novel detection stage. In the paper, you were using CLIP for zero-shot learning, but later you mentioned that you use DETR pre-trained on COCO for non-anomalous detection, and use CLIP softmax score to detect anomalous objects. I am quite confused with the pipeline, and the roles that CLIP and DETR play in 2D anomalous object detection. Could you briefly explain how it works? Thanks a lot!

Best, Luoyu

daniel-bogdoll commented 2 years ago

Hey, thanks for asking! I'm happy to help :) Here's the section from the paper:

We used the classes with the highest frequency in the Waymo dataset as known classes. For that, we applied the Detr [54] 2D detector, pre-trained on the COCO [55] dataset, to detect objects in the 2D images. We took the classes corresponding to the top 99% of the detected objects and ended up with the following 13 classes as the classes not considered anomalies for our approach: car, traffic light, person, truck, bus, fire hydrant, bicycle, handbag, backpack, parking meter, stop sign, umbrella, and motorcycle

So, DETR was only used for the selection of those classes (we did not use groundtruth) and is not part of the main pipeline :)

luoyuchenmlcv commented 2 years ago

Thanks for the clarification! Could you explain what has been done for those clusters that failed to be classified by centerpoint++ ?

From your paper and your explanation, what I have understood is that they were projected to 2D images, and they should be classified as one of the common classes:

car, traffic light, person, truck, bus, fire hydrant, bicycle, handbag, backpack,parking meter, stop sign, umbrella, and motorcycle

or anomalous class. The way to determine anomaly is by the output of the CLIP softmax score.

BTW, I am not quite familiar with CLIP, is it a classifier (since 2D bounding boxes have been determined at this stage, and you feed the features of the 2D bounding box into the CLIP), or purely a detector that directly works on the 2D image?

daniel-bogdoll commented 2 years ago

Fig 2 shows it quite nicely - if Centerpoint cannot detect clustered objects, they are being passed to CLIP to see if the visual semantics help to classify the object. If so, everything's fine, if not, an anomaly hypothesis is created :)