IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
14.88k stars 1.38k forks source link

当图中没有prompt提示的物品时,会把最大的物品当做prompt提示的物品 #409

Open tarepanda1024 opened 10 months ago

tarepanda1024 commented 10 months ago

输入的prompt为sun /cat/ dog 等。图中圈出来的都有问题:

image

image

image

下面这个是原图: 91ac6a31f5c5354578133315dab708ed

tarepanda1024 commented 10 months ago

@rentainhe can you help me or give me some advice ?

rentainhe commented 10 months ago

@rentainhe can you help me or give me some advice ?

There does appear to be an issue with the control over counterexamples in the Grounding-DINO model. This may be due to the model's weights. It might be worth trying better weights to see if it alleviates such a problem.

tarepanda1024 commented 10 months ago

@rentainhe can you help me or give me some advice ?

There does appear to be an issue with the control over counterexamples in the Grounding-DINO model. This may be due to the model's weights. It might be worth trying better weights to see if it alleviates such a problem.

Thx, i will try with another model weight.

tarepanda1024 commented 10 months ago

@rentainhe can you help me or give me some advice ?

There does appear to be an issue with the control over counterexamples in the Grounding-DINO model. This may be due to the model's weights. It might be worth trying better weights to see if it alleviates such a problem.

Sorry, could you please confirm again if you are referring to replacing the model or adjusting the parameters in GroundingDINO_SwinB.cfg.py?

Need i change models blow or adjusting config? image

image

tarepanda1024 commented 10 months ago

My text prompt is 1cat . photos blow are all recogize failed.

image image image image image

NormanBeta commented 10 months ago

我个人实践,在openset上用Grounding-DINO在上做开放目标检测,有些理解

  1. text prompt尽量多测试,并且用地道英语(1cat我都不太能理解),一般框都挺准,但可能和text对不上
  2. box thresh可以调高些,但text thresh过高会出现断词的现象
  3. 对于box占全图过大的case就过滤掉
  4. 加些启发式联合过滤,比如衣服一定有人脸
  5. openset的zero short 不可避免地会有误检,只能说在大数据范围内准确率还有个60%多,剩下的还得double check
tarepanda1024 commented 10 months ago

我个人实践,在openset上用Grounding-DINO在上做开放目标检测,有些理解

  1. text prompt尽量多测试,并且用地道英语(1cat我都不太能理解),一般框都挺准,但可能和text对不上
  2. box thresh可以调高些,但text thresh过高会出现断词的现象
  3. 对于box占全图过大的case就过滤掉
  4. 加些启发式联合过滤,比如衣服一定有人脸
  5. openset的zero short 不可避免地会有误检,只能说在大数据范围内准确率还有个60%多,剩下的还得double check

好的,感谢~