Is it possible to run ONNX decoder with multiple boxes?

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

47.41k stars 5.61k forks source link

point = index * 2; point_coords[0, point, 0] = rect.Left point_coords[0, point, 1] = rect.Top point_coords[0, point + 1, 0] = rect.Right point_coords[0, point + 1, 1] = rect.Bottom point_labels[0, point] = 2 point_labels[0, point + 1] = 3

Yes, you can give multiple input bboxes to Onnx mask decoder. You can do it in two ways:

Give the bboxes in single batch, you can easily do it with by making your point tensor in shape [1, 2N, 2]. Here N is the number of points. As for labels, you'll have to create a Tensor of shape [1, 2N], so for a single bbox the label tensor will have [2.,3.], 2 is label for top left corner of bbox and 3. is label for bottom right corner of bbox. The inference works with this approach but results are unexpected, I don't think it is the right way.
You can send the bboxes in as a Batch. For this to use you'll have to make point_coords and point_labels batchable to do that use below dictionary here dynamic_axes = { "point_coords": {0: "batch_size",1: "num_points"}, "point_labels": {0: "batch_size",1: "num_points"} } . Then you can pass each bbox and label as different inputs in a batch. This works like a charm.

facebookresearch / segment-anything

Is it possible to run ONNX decoder with multiple boxes? #308