Open Ashutosh1995 opened 3 months ago
Also in the training code, you are using point coordinates, what if I want to use boxes as prompts during custom dataset training?
Hi @Ashutosh1995 ,
Thanks a lot @czg1225 for your reply! Let me have a look at this issue and see if I can replicate it on my dataset.
@czg1225 thanks for your help, I was able to train the model using box prompts. My custom dataset has 1306 training images and 293 validation images. During inference also, I supplied box prompts and my output looks like this
I wanted to segment the bottom card alone but in my instances I am getting the card on the top as well.
Can you please suggest what should I do to improve this?
Hi @Ashutosh1995 , I don't know which model you used for the inference phase. If you're using your own trained SlimSAM, I recommend you to further training for more iterations or reduce the pruning ratio. On the other hand, if you're employing our pre-trained SlimSAM, it would be advantageous to utilize more precise bounding boxes (A box that completely surrounds the bottom card). The current box prompt used may pose challenges in achieving the desired outcomes, a difficulty that might persist even when using the original SAM-H model.
Another possible solution is to use the prompt which combines the point and box. You can find more details about the combination prompt in SAM's notebook: https://github.com/facebookresearch/segment-anything/blob/main/notebooks/predictor_example.ipynb
Thanks for your reply @czg1225. With default settings, I trained SlimSAM on my custom dataset with 0.5 as pruning ratio. I was also thinking of reducing the pruning ratio and training for more iterations. How much would you recommend given that I have an image containing multiple stack of cards placed on top of each other?
I was also getting an idea that point prompt in this case would make more sense. Let me try that also with both the prompts.
Before you train the SlimSAM, I recommend you use the original SAM to infer with the same prompt and see if this bad case still exists. If the original SAM performs well under the same prompt then reducing the pruning ratio is useful. Conversely, if the original SAM also underperforms, reducing the pruning ratio may not lead to improvements.
This is the output from the SAM model using only the box prompt using ViT-H model.
Through SlimSAM, can we use both box and point prompts both to train the model as you had suggested earlier ?
I also tried using https://github.com/luca-medeiros/lightning-sam but I think the same issue will persist over there as well right?
Thanks a lot @czg1225 for your valuable feedback! Let me adjust the data accordingly and get back to you. Please keep this thread open!
@czg1225 is it possible to train with both boxes and points and infer the model with just boxes and still attain good masks?
@Ashutosh1995 , the training process of SlimSAM conducts knowledge distillation on the image embedding of the image encoder. So the training process needs no prompts, only the validation phase needs prompts. As a result, you won't meet problems when infering with box prompt or point prompt.
That means @czg1225 that if I give the box prompt only or if I give the box + point prompt combined, the performance would remain the same?
@Ashutosh1995 In a nutshell, different prompts only affect the inference phase but not the training phase. So if you use a more accurate prompt, you may have a better result in the inference phase. But you do not need to think about the prompt when you train the SlimSAM.
@czg1225 thanks a lot for answering all the doubts I had. Let me also try with both the SlimSAM and Lightning-SAM model to see if both the point and box prompts help me in this case!
Hi @czg1225
Thanks for this repo!
Can you please tell how can I do batch prompting by giving multiple boxes as prompts to the model?
Right now, predict function takes one numpy array of length 4
Also can you please clarify that for custom dataset training, does each image file should have a corresponding json or a single json for the whole folder is enough since I am getting this error. I have se gradsize to be 100 and 1000. My training images are 1306 and validation are 293.