CASIA-IVA-Lab / FastSAM

Fast Segment Anything
GNU Affero General Public License v3.0
7.47k stars 707 forks source link

What does the size of onnx model stand for? #145

Open LLsmile opened 1 year ago

LLsmile commented 1 year ago

I transform FastSAM to onnx and the brief description in netron shows like this: Screenshot from 2023-08-22 16-25-24

What does each dimension of output stand for? In output1, 256 * 256 should be size of the mask. So does 32 stand for category or anything else? 21504 in output0 should stand for anchor nums, so what does 37 stand for? 32 category + 4 xywh + 1 det confidence?

LLsmile commented 1 year ago

I want to segment-everything without any prompt. So how can I combine the mask in output1 with bounding box in output0?

ntsmoura commented 1 year ago

This article can help you: https://dev.to/andreygermanov/how-to-implement-instance-segmentation-using-yolov8-neural-network-3if9 , but for segment anything you need to change boxes = output0[:,0:84] to boxes = output0[:,0:5] and masks = output0[:,84:] to masks = output0[:,5:], and you need to replace every 160 conversion to 256 (because it's the size of the mask in FastSAM). To explain, we are changing to 5 because the article is using yolo COCO model, that has 80 classes and 4 more parameters in numpy output, but for SAM we have only one class and + 4 parameters, the aditional 32 parameters are for masks information, so for Yolov8 model it stands for 116 parameters and fastSAM 37, then to post process we need to get the correct parameters.