Closed luoyuchenmlcv closed 2 years ago
Hey, thanks for asking! I'm happy to help :) Here's the section from the paper:
We used the classes with the highest frequency in the Waymo dataset as known classes. For that, we applied the Detr [54] 2D detector, pre-trained on the COCO [55] dataset, to detect objects in the 2D images. We took the classes corresponding to the top 99% of the detected objects and ended up with the following 13 classes as the classes not considered anomalies for our approach: car, traffic light, person, truck, bus, fire hydrant, bicycle, handbag, backpack, parking meter, stop sign, umbrella, and motorcycle
So, DETR was only used for the selection of those classes (we did not use groundtruth) and is not part of the main pipeline :)
Thanks for the clarification! Could you explain what has been done for those clusters that failed to be classified by centerpoint++ ?
From your paper and your explanation, what I have understood is that they were projected to 2D images, and they should be classified as one of the common classes:
car, traffic light, person, truck, bus, fire hydrant, bicycle, handbag, backpack,parking meter, stop sign, umbrella, and motorcycle
or anomalous class. The way to determine anomaly is by the output of the CLIP softmax score.
BTW, I am not quite familiar with CLIP, is it a classifier (since 2D bounding boxes have been determined at this stage, and you feed the features of the 2D bounding box into the CLIP), or purely a detector that directly works on the 2D image?
Fig 2 shows it quite nicely - if Centerpoint cannot detect clustered objects, they are being passed to CLIP to see if the visual semantics help to classify the object. If so, everything's fine, if not, an anomaly hypothesis is created :)
Dear Daniel,
Great work! I am reading your paper recently, but I am a bit confused about the 2D novel detection stage. In the paper, you were using CLIP for zero-shot learning, but later you mentioned that you use DETR pre-trained on COCO for non-anomalous detection, and use CLIP softmax score to detect anomalous objects. I am quite confused with the pipeline, and the roles that CLIP and DETR play in 2D anomalous object detection. Could you briefly explain how it works? Thanks a lot!
Best, Luoyu