facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.86k stars 5.66k forks source link

Finetuning #5

Open kdcd opened 1 year ago

kdcd commented 1 year ago

Is there any plans to release scripts for finetuning the model?

Also you did such a great work! Thank you very much!

codybum commented 1 year ago

Information on fine tuning would be great.

austinmw commented 1 year ago

+1, I'd love to be able to fine tune to improve performance on extremely difficult tiny-object tasks, for example segmenting vehicles in geospatial images:

3CYVS3OSCFVCC4VHCCGRGRUT2Y

penguingiraffe2 commented 1 year ago

this thread is referenced as the answer for similar questions, but I don't think there is an answer here for transfer learning?

jindameias commented 1 year ago

Look forward to finetuning

BenSpex commented 1 year ago

I would love to be able to fine tune the model for specific datasets as well.

hu-po commented 1 year ago

Do we wait for Meta to provide a training/fine-tuning script? Or should the open source hivemind write it?

TimWGY commented 1 year ago

Has anyone tried the idea of what may be called "point prompt engineering"? i.e. training a separate model that learns how to put positive prompt points and negative prompt points, such that these points prompt SAM to select target objects from a custom dataset.

Or we can just summarize strategies and best practices in terms of placing positive and negative prompt points/prompt boxes, similar to how GPT/DALLE users summarize the best ways to write prompts.

Wonder if this could be one way to fine-tune the SAM model when only a limited amount of annotations are available. Happy to discuss more if anyone wants to work together and try it out.

openvino-book commented 1 year ago

+1, Looking forward to fine-tuning the SAM model on the custom dataset.:)

hu-po commented 1 year ago

I am attempting some fine tuning in this repo. Perhaps people can find use in it. The biggest thing I figured out is that you have to break up the Sam model into its components in order for there to be a gradient path for fine-tuning.

dlod-openvino commented 1 year ago

After some messing around I have gotten preliminary fine-tuning to work on my fork. The code is still super messy and early, but perhaps people can find use in it. The biggest thing I figured out is that you have to break up the Sam model into its components in order for there to be a gradient path for fine-tuning.

Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?

hu-po commented 1 year ago

Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?

I can get the smallest pre-trained model (vit_b) with a batch size of 1 in <5GB of GPU memory, but I think fine tuning with those settings would take forever.

codybum commented 1 year ago

I have access to a 4 x A100 /w 80G if you want me to test something.

JunMa11 commented 1 year ago

hi @hu-po ,

Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?

hu-po commented 1 year ago

hi @hu-po ,

Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?

Thank me when I get it to work 😭 this is more complicated than anticipated.

shakesBeardZ commented 1 year ago

+1, interested in fine-tuning it for coral reef images.

AMInnovationTeam commented 1 year ago

+1 interested in fine-tuning it for cracking on roads.

maskani-moh commented 1 year ago

+1 🙌

ariannaravera commented 1 year ago

+1 interested in fine-tuning!

harry-s-grewal commented 1 year ago

+1, I'd like to do some vehicle detection on low quality images!

javiermcebrian commented 1 year ago

+1 interested in fine tunning prompt encoder or mask decoder!

francescodisalvo05 commented 1 year ago

+1! I would be interested in fine-tuning the model for medical image analysis

travishsu commented 1 year ago

I'm curious that is it possible to point out an unknown object have not been learned (like anomaly detection) by text prompt if I fine-tune with custom data.

imandrealombardo commented 1 year ago

+1!

satpalsr commented 1 year ago

CC: @ericmintun @nikhilaravi

Kenneth-X commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

hu-po commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data

bhpfelix commented 1 year ago

I have a finetune starter code for COCO instance segmentation format data with some basic functionalities at this repo. Hope it would help!

alex-encord commented 1 year ago

Hey, we wrote a blog post outlining some of the key steps to fine tune SAM using the mask decoder, particularly describing which functions from SAM to use to pre/post process the data so that it's in a good shape for fine tuning.

codybum commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data

I know those guys!

750563720 commented 1 year ago

I am a student and I am also looking forward to the release of the fine-tuning to complete my academic paper, and I would be very grateful if it is released

ZhuJD-China commented 1 year ago

+1 thanks for finetuning

ulsha commented 1 year ago

@codybum @750563720 @ZhuJD-China I tried this tutorial @alex-encord posted, seems to be working well. Would be easier with a Colab Notebook though. Dunno if that's in the works?

Edit: Looks like Colab was added - thank you!

JunMa11 commented 1 year ago

colab

https://colab.research.google.com/drive/1F6uRommb3GswcRlPZWpkAQRMVNdVH7Ww?usp=sharing

cnsystem commented 1 year ago

It's intersting if SAM has been finetuned on autonomouse driving dataset.

RCfun commented 1 year ago

I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

I modify the decoder head like this:

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

However, I got errors from the decoder embeddings fusions:

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

In this case, my sparse_embeddings has shape [B,0,256], and dense_embeddings has shape [B, 256, 256, 256]. Anyone has ideas?

rmokady commented 1 year ago

I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

I modify the decoder head like this:

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

However, I got errors from the decoder embeddings fusions:

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

In this case, my sparse_embeddings has shape [B,0,256], and dense_embeddings has shape [B, 256, 256, 256]. Anyone has ideas?

I opened a new issue #277

WusterHappy commented 1 year ago

我正在尝试在没有 bbox 的情况下对语义分割任务进行微调,但只有标签。我将图像和标签调整为 1024x1024,并使用以下代码:

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

我这样修改解码器头:

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

但是,我从解码器嵌入融合中得到了错误:

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

在本例中,我的sparse_embeddings形状为 [B,0,256],并且dense_embeddings形状为 [B, 256, 256, 256]。有人有想法吗?

I use the same way to fine_tune sam, label as mask prompt input, and then get predict output.I compute loss(ground_truth, output) but the loss no decline ,remain invariable,can you give me some advice?I would appreciate it!

luca-medeiros commented 1 year ago

Made a repo for fine-tuning SAM. A PoC to see if fine-tuning SAM using bounding boxes as prompts would increase the IoU or improve the quality of the masks in general. One can use a COCO format dataset to fine-tune SAM for a specific task where SAM does not perform well (e.g., segmenting text on documents) and then use that model with interactive prompts just like SAM. https://github.com/luca-medeiros/lightning-sam

mydcxiao commented 1 year ago

following

tommiekerssies commented 1 year ago

following

Tobyzai commented 1 year ago

+1

chinmay5 commented 1 year ago

+1

MaxMatti commented 1 year ago

Guys you can just hit the subscribe button on the right to get notifications for just this thread without annoying everybody who already subscribed :angry:

JunMa11 commented 1 year ago

Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti

We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.

https://github.com/bowang-lab/MedSAM#model-training-video-tutorial

dshlai commented 1 year ago

also found this one using adaptors for efficient fine tuning: https://github.com/tianrun-chen/SAM-Adapter-PyTorch

dorbodwolf commented 1 year ago

thank you all for sharing

EnchiridionHero commented 1 year ago

@aadilmehdis The model worked pretty well when I was able to figure out the UI. Was not very intuitive to work through though(took a while to train model, and encountered some bugs). Is the finetuning code open source? Would be interested to know what losses/hyperparameters you used.

Few issues I faced:

mikiotada commented 1 year ago

Following up on this, I found the method section of @JunMa11 's manuscript https://arxiv.org/pdf/2304.12306.pdf useful to get an idea of how to fine-tune SAM.

Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti

We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.

https://github.com/bowang-lab/MedSAM#model-training-video-tutorial

andyoung009 commented 1 year ago

+1, interested in fine-tuning it for some domain special datasets.

Jeff1933 commented 1 year ago

@EnchiridionHero Hi, I want to know how to upload the image on the https://app.instalabel.ai, Should I upload both the original and annotated images at the same time? I have read the operating instructions, but I still don't understand.