关于visual promt - Githubissues

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

MIT License

1.02k stars 82 forks source link

Thank you for your interest in GLEE!

Unfortunately, GLEE has not been trained for multi-turn interactive learning with visual prompts, so it is unable to modify the segmentation results based on multiple clicks. However, different segmentation results can be obtained by drawing different shapes on the app.
We need to sample some features on the image based on the visual prompt for self-attention. To extract more information, we expand a point into a small box to standardize this process.
In the visual prompt mode, the top-k can only be 1. If an object is segmented into two parts, it should still be represented by one object query. Otherwise, it will be represented as two separate objects.

FoundationVision / GLEE