NVIDIA-AI-IOT / nanosam

A distilled Segment Anything (SAM) model capable of running real-time with NVIDIA TensorRT
Apache License 2.0
616 stars 52 forks source link

question about mask dimension #14

Open spacewalk01 opened 8 months ago

spacewalk01 commented 8 months ago

Thank you for your work! The output mask shape is 4x256x256 where 4 is (I guess) the number of labels and 256x256 is mask's height x width dimensions. I wonder how to get 1x256x256 mask from the output?

spacewalk01 commented 8 months ago

https://github.com/NVIDIA-AI-IOT/nanosam/blob/653633614b2eb93b06ba3be9adb2aeffb117bd72/nanosam/utils/predictor.py#L134C7-L134C7

spacewalk01 commented 8 months ago

you are performing (mask[0, 0] > 0).detach().cpu().numpy() on 1 x 4 x 256 x 256 matrix which means that first only 256 x 256 returns ignoring the rest, is it background mask?

spacewalk01 commented 8 months ago

I am reimplementing it but in my implementation, it gets me incorrect results on the mask detection task given a bounding box.