FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
https://glee-vision.github.io/
MIT License
1.06k stars 82 forks source link

the expression prompt mode only output one object? #4

Open GallonDeng opened 10 months ago

GallonDeng commented 10 months ago

in the hugging face demo, the expression prompt mode only output one object, even if there are multi same objects?

0xNOY commented 10 months ago

Yes, in the demo, only one object is output in the expression prompt mode. To output multiple objects, you need to change the value of the variable topK_instance in line 174 of app/app.py.

GallonDeng commented 10 months ago

@0xNOY thanks very much. So I should change topk_instance to 10, e.g, which means top10?

GallonDeng commented 10 months ago

could you please explain more details about the grouding mode and category mode in the tokenization stage? It seems that we can input multi categoires an once in a prompt, but only one expression in a prompt? Could both modes be the same?

wjf5203 commented 7 months ago

@AllenDun Thank you for your interest! In our setting, if the input is an expression, the model will only find the single most relevant object, as topK=1 is a fixed parameter set by design. If you want to find multiple objects at the same time, you can try using the 'categories' -> 'custom-list' mode, which supports a list of arbitrary category names separated by commas. In this mode, you can freely set the TopK, and you can input multiple categories at once.