facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
11.6k stars 1.01k forks source link

Plans to update model weights. #363

Open GeorgiaA opened 1 week ago

GeorgiaA commented 1 week ago

Hello,

Are there any plans to update the model weights for the SAM2 models in this repo to match the weights used in the online demo?

I discussed with someone from Meta last week at ECCV2024 who said the SAM2 model used in the online demo is trained on the Meta dataset and many open-source datasets, however, the model released on GitHub is only trained on the Meta dataset.

I have found that the online demo works well at segmenting some underwater data that I'm using. Still, when using the notebooks provided in the repo, the SAM2 model cannot segment the objects of interest. I am interested in zero-shot segmentation approaches.

Thanks.

chayryali commented 5 days ago

Hi @GeorgiaA, It is not planned to release models trained on other data (e.g. using academic datasets) to be compliant with an Apache 2.0 license.

Are you able to share any examples? What prompt and model size/version are you using?

GeorgiaA commented 1 day ago

Hi @chayryali ah ok that makes sense.

Here is an example images/video that I am working with: Screenshot 2024-10-14 at 15 36 12

I am using the subpipe dataset. This is after the image has been preprocessed to make the picture clearer. I am interested in segmenting the dark grey part of the image (it is a subsea pipe).

When I add a short video to the SAM2 demo online it and give it a couple of points it does well at selecting the pipe in the image. Screenshot 2024-10-14 at 15 36 57

However, when I try the video_predictor_example.ipynb notebook supplied in this repo using sam2.1_hiera_l.yaml model config with sam2.1_hiera_large.pt checkpoint weights and pass in a couple of points it is unsuccessful in segmenting the pipe. Screenshot 2024-10-14 at 15 42 43

I am looking for a zero-shot approach as I do not have the computational power required to fine-tune SAM or SAM2. I am looking for a zero-shot approach as I want to build a pipeline to segment many different underwater objects, however I only have data for underwater pipes at the moment. Do you have any suggestions? It would be greatly appreciated!

heyoeyo commented 18 hours ago

For what it's worth, the large model does seem to be able to segment the given image if a similar 2-point prompt is given:

segment_example

However, it appears in the last-most mask output and isn't necessarily the highest ranked by IoU prediction (especially if using the v2.1 model, which predicts a significantly lower IoU compared to v2). The mask can be cleaned up by re-running the same image repeatedly through the video processing part of the model (see issue #352) and that also seems to give a mask result with a high stability score, which may be a way to help automate the selection of the mask (i.e. the 'good' mask ends up with a high stability score after repeat encoding of the image as a video... sort of a weird processing pipeline to be fair).