Open 25benjaminli opened 2 months ago
If this is the result without any additional training, I think that's normal. Without prompts, the models generate very scattered/non-specific outputs.
In case it's of any use, a user on the SAM v1 issues had a blog post explaining how they set up a promptless version of the model. It's a different approach (training a custom decoder), but may be a useful reference.
Hi all, I am trying to build a pipeline to train without prompts and only use the default sparse & dense embeddings + image embedding. For some reason, the resulting segmentation doesn't seem to do well compared to the demos. Please note that this code is different from the inference code provided in the repository because my ultimate intention is to train the model. Also, I am using the approach discussed in #138 to input flexible image size.
Please see this screenshot of the resulting segmentation mask overlaid on top of the original image.
Thanks for the help!