Open kdcd opened 1 year ago
Information on fine tuning would be great.
+1, I'd love to be able to fine tune to improve performance on extremely difficult tiny-object tasks, for example segmenting vehicles in geospatial images:
this thread is referenced as the answer for similar questions, but I don't think there is an answer here for transfer learning?
Look forward to finetuning
I would love to be able to fine tune the model for specific datasets as well.
Do we wait for Meta to provide a training/fine-tuning script? Or should the open source hivemind write it?
Has anyone tried the idea of what may be called "point prompt engineering"? i.e. training a separate model that learns how to put positive prompt points and negative prompt points, such that these points prompt SAM to select target objects from a custom dataset.
Or we can just summarize strategies and best practices in terms of placing positive and negative prompt points/prompt boxes, similar to how GPT/DALLE users summarize the best ways to write prompts.
Wonder if this could be one way to fine-tune the SAM model when only a limited amount of annotations are available. Happy to discuss more if anyone wants to work together and try it out.
+1, Looking forward to fine-tuning the SAM model on the custom dataset.:)
I am attempting some fine tuning in this repo. Perhaps people can find use in it. The biggest thing I figured out is that you have to break up the Sam
model into its components in order for there to be a gradient path for fine-tuning.
After some messing around I have gotten preliminary fine-tuning to work on my fork. The code is still super messy and early, but perhaps people can find use in it. The biggest thing I figured out is that you have to break up the
Sam
model into its components in order for there to be a gradient path for fine-tuning.
Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?
Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4?
I can get the smallest pre-trained model (vit_b
) with a batch size of 1 in <5GB of GPU memory, but I think fine tuning with those settings would take forever.
I have access to a 4 x A100 /w 80G if you want me to test something.
hi @hu-po ,
Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?
hi @hu-po ,
Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)?
Thank me when I get it to work 😭 this is more complicated than anticipated.
+1, interested in fine-tuning it for coral reef images.
+1 interested in fine-tuning it for cracking on roads.
+1 🙌
+1 interested in fine-tuning!
+1, I'd like to do some vehicle detection on low quality images!
+1 interested in fine tunning prompt encoder or mask decoder!
+1! I would be interested in fine-tuning the model for medical image analysis
I'm curious that is it possible to point out an unknown object have not been learned (like anomaly detection) by text prompt if I fine-tune with custom data.
+1!
CC: @ericmintun @nikhilaravi
@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html
@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html
No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data
I have a finetune starter code for COCO instance segmentation format data with some basic functionalities at this repo. Hope it would help!
Hey, we wrote a blog post outlining some of the key steps to fine tune SAM using the mask decoder, particularly describing which functions from SAM to use to pre/post process the data so that it's in a good shape for fine tuning.
@hu-po hi, nice work for sharing finetune script , is "FragmentDataset" the datasets that released by official datasets https://segment-anything.com/dataset/index.html
No, it's a custom dataset for x-ray data of scroll fragments: https://www.kaggle.com/competitions/vesuvius-challenge-ink-detection/data
I know those guys!
I am a student and I am also looking forward to the release of the fine-tuning to complete my academic paper, and I would be very grateful if it is released
+1 thanks for finetuning
@codybum @750563720 @ZhuJD-China I tried this tutorial @alex-encord posted, seems to be working well. Would be easier with a Colab Notebook though. Dunno if that's in the works?
Edit: Looks like Colab was added - thank you!
It's intersting if SAM has been finetuned on autonomouse driving dataset.
I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:
# Assign data
img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size']
# Map to variables
img = Variable(img)
label = Variable(label)
# Get embeddings
with torch.no_grad():
image_embedding = self.model.image_encoder(img)
sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
points=None,boxes=None,masks=label.float())
# Get predictions
low_res_masks, iou_predictions = self.model.mask_decoder(
image_embeddings=image_embedding,
image_pe=self.model.prompt_encoder.get_dense_pe(),
sparse_prompt_embeddings=sparse_embeddings,
dense_prompt_embeddings=dense_embeddings,
multimask_output=True,
)
upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)
`
I modify the decoder head like this:
net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt)
d = net.mask_decoder
net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes)
return net
However, I got errors from the decoder embeddings fusions:
from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks
src = src + dense_prompt_embeddings
RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3
In this case, my sparse_embeddings
has shape [B,0,256], and dense_embeddings
has shape [B, 256, 256, 256]. Anyone has ideas?
I'm trying to do a fine-tuning on the semantic segmentation task without bbox, but only labels. I resized the image and label to be 1024x1024, and use the following codes:
# Assign data img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size'] # Map to variables img = Variable(img) label = Variable(label) # Get embeddings with torch.no_grad(): image_embedding = self.model.image_encoder(img) sparse_embeddings, dense_embeddings = self.model.prompt_encoder( points=None,boxes=None,masks=label.float()) # Get predictions low_res_masks, iou_predictions = self.model.mask_decoder( image_embeddings=image_embedding, image_pe=self.model.prompt_encoder.get_dense_pe(), sparse_prompt_embeddings=sparse_embeddings, dense_prompt_embeddings=dense_embeddings, multimask_output=True, ) upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)
`
I modify the decoder head like this:
net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt) d = net.mask_decoder net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes) return net
However, I got errors from the decoder embeddings fusions:
from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks src = src + dense_prompt_embeddings RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3
In this case, my
sparse_embeddings
has shape [B,0,256], anddense_embeddings
has shape [B, 256, 256, 256]. Anyone has ideas?
I opened a new issue #277
我正在尝试在没有 bbox 的情况下对语义分割任务进行微调,但只有标签。我将图像和标签调整为 1024x1024,并使用以下代码:
# Assign data img, label, o_img_size, n_img_size = batch['image'], batch['label'], batch['original_image_size'], batch['image_size'] # Map to variables img = Variable(img) label = Variable(label) # Get embeddings with torch.no_grad(): image_embedding = self.model.image_encoder(img) sparse_embeddings, dense_embeddings = self.model.prompt_encoder( points=None,boxes=None,masks=label.float()) # Get predictions low_res_masks, iou_predictions = self.model.mask_decoder( image_embeddings=image_embedding, image_pe=self.model.prompt_encoder.get_dense_pe(), sparse_prompt_embeddings=sparse_embeddings, dense_prompt_embeddings=dense_embeddings, multimask_output=True, ) upscaled_masks = self.model.postprocess_masks(low_res_masks, n_img_size, o_img_size)
`
我这样修改解码器头:
net = sam_model_registry[args.sam_model_type](checkpoint=args.ckpt) d = net.mask_decoder net.mask_decoder = MaskDecoder(transformer_dim=d.transformer_dim, transformer=d.transformer, num_multimask_outputs=args.num_classes) return net
但是,我从解码器嵌入融合中得到了错误:
from segment_anything/modeling/mask_decoder.py", line 127, in predict_masks src = src + dense_prompt_embeddings RuntimeError: The size of tensor a (64) must match the size of tensor b (256) at non-singleton dimension 3
在本例中,我的
sparse_embeddings
形状为 [B,0,256],并且dense_embeddings
形状为 [B, 256, 256, 256]。有人有想法吗?
I use the same way to fine_tune sam, label as mask prompt input, and then get predict output.I compute loss(ground_truth, output) but the loss no decline ,remain invariable,can you give me some advice?I would appreciate it!
Made a repo for fine-tuning SAM. A PoC to see if fine-tuning SAM using bounding boxes as prompts would increase the IoU or improve the quality of the masks in general. One can use a COCO format dataset to fine-tune SAM for a specific task where SAM does not perform well (e.g., segmenting text on documents) and then use that model with interactive prompts just like SAM. https://github.com/luca-medeiros/lightning-sam
following
following
+1
+1
Guys you can just hit the subscribe button on the right to get notifications for just this thread without annoying everybody who already subscribed :angry:
Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti
We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.
https://github.com/bowang-lab/MedSAM#model-training-video-tutorial
also found this one using adaptors for efficient fine tuning: https://github.com/tianrun-chen/SAM-Adapter-PyTorch
thank you all for sharing
@aadilmehdis The model worked pretty well when I was able to figure out the UI. Was not very intuitive to work through though(took a while to train model, and encountered some bugs). Is the finetuning code open source? Would be interested to know what losses/hyperparameters you used.
Few issues I faced:
Following up on this, I found the method section of @JunMa11 's manuscript https://arxiv.org/pdf/2304.12306.pdf useful to get an idea of how to fine-tune SAM.
Hi @kdcd @codybum @austinmw @penguingiraffe2 @jindameias @BenSpex @hu-po @openvino-book @dlod-openvino @shakesBeardZ @AMInnovationTeam @maskani-moh @ariannaravera @harry-s-grewal @javiermcebrian @francescodisalvo05 @travishsu @imandrealombardo @Kenneth-X @bhpfelix @alex-encord @750563720 @ZhuJD-China @RCfun @Tobyzai @MaxMatti
We provide a step-by-step tutorial on fine-tuning SAM on 2D and 3D medical image datasets. It requires less than 10G GPU memory. Hope that it could be useful.
https://github.com/bowang-lab/MedSAM#model-training-video-tutorial
+1, interested in fine-tuning it for some domain special datasets.
@EnchiridionHero Hi, I want to know how to upload the image on the https://app.instalabel.ai, Should I upload both the original and annotated images at the same time? I have read the operating instructions, but I still don't understand.
Is there any plans to release scripts for finetuning the model?
Also you did such a great work! Thank you very much!