CASIA-IVA-Lab / FastSAM

Fast Segment Anything
GNU Affero General Public License v3.0
7.47k stars 707 forks source link

it can only detact (and segment) by minor prompts if applying with text mode? #62

Closed beefsoup18 closed 1 year ago

beefsoup18 commented 1 year ago

I tried to test some other prompt-words such as black eyes, wood, sands in text mode with the sample picture in huggingface, but wrong results were given. Does it is possibly because the sample prompt-words such as yellow dog and blcak dog were ever applied in the prompts of training dataset?

berry-ding commented 1 year ago

Hello @beefsoup18 , thank you for your attention. We use CLIP as the model for text prompt processing, and we have not fine-tuned on specific categories. You can try with more images. However, as FastSAM uses local information when processing text prompts, the absence of context may indeed lead to a decline in recognition performance for certain categories. If you desire better recognition results, you can try our FastSAM + GroundingDINO. https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM/FastSAM

an-yongqi commented 1 year ago

We would close this issue for now to keep the issue tracker organized. However, if the problem persists or if you have any further questions, please feel free to comment here or open a new issue. We value your input and are happy to assist further.