facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.32k stars 700 forks source link

multi-object retrieval #175

Open wendy0527 opened 10 months ago

wendy0527 commented 10 months ago

As far as I know, most of the current image retrieval is for a single object or several objects. But for the field of autonomous driving, a picture taken is multi-object, for example, a picture contains people, cars, fences, buildings, trees, etc. Only using [cls_token] will lose a lot of object information. Are there any suggestions for multi-object retrieval?

ChaosSamKo commented 7 months ago

I guess you could first run instance segmentation and then calculate the embedding for each segmented instance. This takes as many forward operations as there are instances in the image, so there may be a more optimal way. Also global context information is lost when embedding each segmented instance individually.