Open LLsmile opened 1 year ago
I want to segment-everything without any prompt. So how can I combine the mask in output1 with bounding box in output0?
This article can help you: https://dev.to/andreygermanov/how-to-implement-instance-segmentation-using-yolov8-neural-network-3if9 , but for segment anything you need to change boxes = output0[:,0:84] to boxes = output0[:,0:5] and masks = output0[:,84:] to masks = output0[:,5:], and you need to replace every 160 conversion to 256 (because it's the size of the mask in FastSAM). To explain, we are changing to 5 because the article is using yolo COCO model, that has 80 classes and 4 more parameters in numpy output, but for SAM we have only one class and + 4 parameters, the aditional 32 parameters are for masks information, so for Yolov8 model it stands for 116 parameters and fastSAM 37, then to post process we need to get the correct parameters.
I transform FastSAM to onnx and the brief description in netron shows like this:
What does each dimension of output stand for? In output1, 256 * 256 should be size of the mask. So does 32 stand for category or anything else? 21504 in output0 should stand for anchor nums, so what does 37 stand for? 32 category + 4 xywh + 1 det confidence?