facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
10.73k stars 865 forks source link

occlusion score #239

Open TzahiShimkinn opened 4 weeks ago

TzahiShimkinn commented 4 weeks ago

The paper shows an occlusion score output for each object. However, I can't find it in the output of either the image or the video predictors. Am I missing something?

heyoeyo commented 4 weeks ago

I believe what they call the 'occlusion score' in the paper is called 'object score' in the code. It's calculated inside the mask decoder. In the video predictor, it gets calculated inside the forward_sam_heads function, but isn't stored as part of the video outputs (it's used to modify other outputs though).

The value seems to be around +6 to +8 when an object is 'well' tracked in a video, and then goes negative when tracking loses the object (so it's more a measure of if the object is present, rather than occluded).

Necolizer commented 2 weeks ago

Same question for me. Can the authors provide more details about the occlusion score and how they calculate the occlusion loss with cross-entropy?