facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.9k stars 5.67k forks source link

Detail about the 11 iteration during training #689

Open Nastu-Ho opened 9 months ago

Nastu-Ho commented 9 months ago

During the training process, each batch of image embeddings will be iteratively input to the mask decoder 11 times, and the mask decoder will output mask logits 11 times. I wonder if the mask logits of each output need to be used to calculate the loss, or is it just the last prediction to get the mask logits to calculate the loss?

Vishawjeet-rmsl commented 1 month ago

I'm also wondering what the authors mean by this. Is it for the mask generation for SA-1B or during the training of SAM itself. i.e., if someone is trying to train a SAM model (with different architecture) do they have to implement this as well.