Finetuning - Githubissues

kdcd commented 1 year ago

Is there any plans to release scripts for finetuning the model?

Also you did such a great work! Thank you very much!

codybum commented 1 year ago

Information on fine tuning would be great.

austinmw commented 1 year ago

+1, I'd love to be able to fine tune to improve performance on extremely difficult tiny-object tasks, for example segmenting vehicles in geospatial images:

3CYVS3OSCFVCC4VHCCGRGRUT2Y

penguingiraffe2 commented 1 year ago

this thread is referenced as the answer for similar questions, but I don't think there is an answer here for transfer learning?

jindameias commented 1 year ago

Look forward to finetuning

BenSpex commented 1 year ago

I would love to be able to fine tune the model for specific datasets as well.

hu-po commented 1 year ago

Do we wait for Meta to provide a training/fine-tuning script? Or should the open source hivemind write it?

TimWGY commented 1 year ago

Has anyone tried the idea of what may be called "point prompt engineering"? i.e. training a separate model that learns how to put positive prompt points and negative prompt points, such that these points prompt SAM to select target objects from a custom dataset.

Or we can just summarize strategies and best practices in terms of placing positive and negative prompt points/prompt boxes, similar to how GPT/DALLE users summarize the best ways to write prompts.

Wonder if this could be one way to fine-tune the SAM model when only a limited amount of annotations are available. Happy to discuss more if anyone wants to work together and try it out.

openvino-book commented 1 year ago

+1, Looking forward to fine-tuning the SAM model on the custom dataset.:)

hu-po commented 1 year ago

I am attempting some fine tuning in this repo. Perhaps people can find use in it. The biggest thing I figured out is that you have to break up the Sam model into its components in order for there to be a gradient path for fine-tuning.

dlod-openvino commented 1 year ago

After some messing around I have gotten preliminary fine-tuning to work on my fork. The code is still super messy and early, but perhaps people can find use in it. The biggest thing I figured out is that you have to break up the Sam model into its components in order for there to be a gradient path for fine-tuning.

Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?

hu-po commented 1 year ago

Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?

I can get the smallest pre-trained model (vit_b) with a batch size of 1 in <5GB of GPU memory, but I think fine tuning with those settings would take forever.

codybum commented 1 year ago

I have access to a 4 x A100 /w 80G if you want me to test something.

JunMa11 commented 1 year ago

hi @hu-po ,

Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?

hu-po commented 1 year ago

hi @hu-po ,

Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?

Thank me when I get it to work 😭 this is more complicated than anticipated.

shakesBeardZ commented 1 year ago

+1, interested in fine-tuning it for coral reef images.

AMInnovationTeam commented 1 year ago

+1 interested in fine-tuning it for cracking on roads.

maskani-moh commented 1 year ago

+1 🙌

ariannaravera commented 1 year ago

+1 interested in fine-tuning!

harry-s-grewal commented 1 year ago

+1, I'd like to do some vehicle detection on low quality images!

javiermcebrian commented 1 year ago

+1 interested in fine tunning prompt encoder or mask decoder!

francescodisalvo05 commented 1 year ago

+1! I would be interested in fine-tuning the model for medical image analysis

travishsu commented 1 year ago

I'm curious that is it possible to point out an unknown object have not been learned (like anomaly detection) by text prompt if I fine-tune with custom data.

imandrealombardo commented 1 year ago

+1!

satpalsr commented 1 year ago

CC: @ericmintun @nikhilaravi

Kenneth-X commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

hu-po commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data

bhpfelix commented 1 year ago

I have a finetune starter code for COCO instance segmentation format data with some basic functionalities at this repo. Hope it would help!

alex-encord commented 1 year ago

Hey, we wrote a blog post outlining some of the key steps to fine tune SAM using the mask decoder, particularly describing which functions from SAM to use to pre/post process the data so that it's in a good shape for fine tuning.

codybum commented 1 year ago

@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html

No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data

I know those guys!

750563720 commented 1 year ago

I am a student and I am also looking forward to the release of the fine-tuning to complete my academic paper, and I would be very grateful if it is released

ZhuJD-China commented 1 year ago

+1 thanks for finetuning

ulsha commented 1 year ago

@codybum @750563720 @ZhuJD-China I tried this tutorial @alex-encord posted, seems to be working well. Would be easier with a Colab Notebook though. Dunno if that's in the works?

Edit: Looks like Colab was added - thank you!

JunMa11 commented 1 year ago

colab

https://colab.research.google.com/drive/1F6uRommb3GswcRlPZWpkAQRMVNdVH7Ww?usp=sharing

cnsystem commented 1 year ago

It's intersting if SAM has been finetuned on autonomouse driving dataset.

RCfun commented 1 year ago

I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

I modify the decoder head like this:

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

However, I got errors from the decoder embeddings fusions:

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

In this case, my sparse_embeddings has shape [B,0,256], and dense_embeddings has shape [B, 256, 256, 256]. Anyone has ideas?

rmokady commented 1 year ago

I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

I modify the decoder head like this:

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

However, I got errors from the decoder embeddings fusions:

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

In this case, my sparse_embeddings has shape [B,0,256], and dense_embeddings has shape [B, 256, 256, 256]. Anyone has ideas?

I opened a new issue #277

WusterHappy commented 1 year ago

我正在尝试在没有 bbox 的情况下对语义分割任务进行微调，但只有标签。我将图像和标签调整为 1024x1024，并使用以下代码：

    # Assign data
    img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']

    # Map to variables
    img = Variable(img)
    label = Variable(label)

    # Get embeddings
    with torch.no_grad():
        image_embedding = self.model.image_encoder(img)
        sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
            points=None,boxes=None,masks=label.float())

    # Get predictions
    low_res_masks, iou_predictions = self.model.mask_decoder(
        image_embeddings=image_embedding,
        image_pe=self.model.prompt_encoder.get_dense_pe(),
        sparse_prompt_embeddings=sparse_embeddings,
        dense_prompt_embeddings=dense_embeddings,
        multimask_output=True,
        )
    upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)

`

我这样修改解码器头：

    net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
    d = net.mask_decoder
    net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)

    return net

但是，我从解码器嵌入融合中得到了错误：

    from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
    src = src + dense_prompt_embeddings
    RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3

在本例中，我的sparse_embeddings形状为 [B,0,256]，并且dense_embeddings形状为 [B, 256, 256, 256]。有人有想法吗？

I use the same way to fine_tune sam, label as mask prompt input, and then get predict output.I compute loss(ground_truth, output) but the loss no decline ,remain invariable，can you give me some advice?I would appreciate it！

luca-medeiros commented 1 year ago

Made a repo for fine-tuning SAM. A PoC to see if fine-tuning SAM using bounding boxes as prompts would increase the IoU or improve the quality of the masks in general. One can use a COCO format dataset to fine-tune SAM for a specific task where SAM does not perform well (e.g., segmenting text on documents) and then use that model with interactive prompts just like SAM. https://github.com/luca-medeiros/lightning-sam

mydcxiao commented 1 year ago

following

tommiekerssies commented 1 year ago

following

Tobyzai commented 1 year ago

+1

chinmay5 commented 1 year ago

+1

MaxMatti commented 1 year ago

Guys you can just hit the subscribe button on the right to get notifications for just this thread without annoying everybody who already subscribed :angry:

JunMa11 commented 1 year ago

Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti

We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.

https://github.com/bowang-lab/MedSAM#model-training-video-tutorial

dshlai commented 1 year ago

also found this one using adaptors for efficient fine tuning: https://github.com/tianrun-chen/SAM-Adapter-PyTorch

dorbodwolf commented 1 year ago

thank you all for sharing

EnchiridionHero commented 1 year ago

@aadilmehdis The model worked pretty well when I was able to figure out the UI. Was not very intuitive to work through though(took a while to train model, and encountered some bugs). Is the finetuning code open source? Would be interested to know what losses/hyperparameters you used.

Few issues I faced:

The loader hadn't updated even after waiting for 30 mins. I thought this was because the training was going on for a while(loader was still there when I refreshed the page), but when I went to the 'annotate' tab, I was able to download the finetuned model.pth file. The loader just didn't update for some reason.
When I faced the loading issue, I pressed train annotator button again. This time, despite not seeing any changes(loader was still there) the model trained after 15 mins(and automatically saved a 'best_model.ckpt' file). This seems to be the indication that it actually finetuned your model(the automatic trigger/download of the ckpt file).

mikiotada commented 1 year ago

Following up on this, I found the method section of @JunMa11 's manuscript https://arxiv.org/pdf/2304.12306.pdf useful to get an idea of how to fine-tune SAM.

Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti

We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.

https://github.com/bowang-lab/MedSAM#model-training-video-tutorial

andyoung009 commented 1 year ago

+1, interested in fine-tuning it for some domain special datasets.

Jeff1933 commented 1 year ago

@EnchiridionHero Hi, I want to know how to upload the image on the https://app.instalabel.ai, Should I upload both the original and annotated images at the same time? I have read the operating instructions, but I still don't understand.

facebookresearch / segment-anything

Finetuning #5