YoungSean / NIDS-Net

NIDS-Net: A unified framework for novel instance detection and segmentation
MIT License
44 stars 4 forks source link

1 vs 1 or 1 vs n? #4

Closed xiexie123 closed 3 months ago

xiexie123 commented 3 months ago

great work. 大概看了下git的图,我的理解是计算template和query的特征相似度。取max为最后的匹配结果。基于上诉理解有2个问题

  1. 在test_NIDS_one_shot_demo里 代码里并没有sim_mat取max后的阈值。即有先验认为template是在query图里
  2. demo里用 'toy' prompt的时候本身就匹配了。所以修改了代码,query='objects' thr=0.1. 得到22个bbox。计算得到sim_mat,取max(默认1vs1)结果正确。但是查看sim_mat,与另一个绿色玩具的相似度也很高(0.7135)。大多数使用场景都是1 vs n. 比如一张餐桌4张餐椅。餐椅都长得一样。这种情况如果没有先验的话(取topk=4),会有漏检或者误检测。不知道这个问题有没有考虑呢?

不知道我理解的对不对。期盼指正

YoungSean commented 3 months ago

test_NIDS_one_shot_demo 只是非常简化的版本。不需要先验认为template在query中。如果template不在,就会设置阈值而认定object proposals都是background objects。

使用"toy" prompt,只是顺便告诉大家,不一定只使用"object" prompt。具体应用场景下可以用不同的prompt来缩小范围。比如在动漫场景下,使用"person" prompt来加快检测动漫人物。存在多个同样的物体(1 vs n)时,那就是和BOP segmentation datasets类似的情况。通过argmax直接去找每个object proposal最接近的相应物体和设置阈值即可,不需要topk。如果有非常类似的不同物体,就可能会有误检测。

(1 vs 1) 假设场景中最多只有一个template时,可以用stable matching来处理多个高相似度的物体。

test_NIDS_one_shot_demo is just a very simplified version. There is no prior assumption that the template is in the query. If the template is not present, a threshold will be set to determine that all object proposals are background objects.

Using a "toy" prompt simply informs everyone that prompts don't necessarily have to be "object" prompts only. In specific application scenarios, different prompts can be used to narrow the scope. For example, in an anime context, a "person" prompt can be used to speed up the detection of anime characters. When there are multiple identical objects (1 vs n), it is similar to the situation with BOP segmentation datasets. You can directly use argmax to find the closest corresponding object for each object proposal and set the threshold, without the need for topk. If there are similar but different objects and the threshold is not high, this may lead to false detections.

In a (1 vs 1) scenario where there is at most one template in the scene, stable matching can be used to handle multiple objects with high similarity.