Closed beefsoup18 closed 1 year ago
Hello @beefsoup18 , thank you for your attention. We use CLIP as the model for text prompt processing, and we have not fine-tuned on specific categories. You can try with more images. However, as FastSAM uses local information when processing text prompts, the absence of context may indeed lead to a decline in recognition performance for certain categories. If you desire better recognition results, you can try our FastSAM + GroundingDINO. https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM/FastSAM
We would close this issue for now to keep the issue tracker organized. However, if the problem persists or if you have any further questions, please feel free to comment here or open a new issue. We value your input and are happy to assist further.
I tried to test some other prompt-words such as black eyes, wood, sands in text mode with the sample picture in huggingface, but wrong results were given. Does it is possibly because the sample prompt-words such as yellow dog and blcak dog were ever applied in the prompts of training dataset?