Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

NMS in CoreML model #1333

Closed ggbifulco closed 1 year ago

ggbifulco commented 1 year ago

💡 Your Question

Hi, I would need to include the NMS (non_max_suppression) in the model converted to CoreML. Converting the model I noticed that on Netron the layer that makes the NMS is not present. Is this feature implemented?

Thanks a lot!

Versions

No response

BloodAxe commented 1 year ago

As of now - it is not. However it is currently being worked on. Regarding CoreML part - we will attach NMS layer from ONNX opset. I don't know whether CoreML will be able to digest it or not.

ggbifulco commented 1 year ago

Is it possible to have an ONNX model with the NMS layer integrated after the export?

BloodAxe commented 1 year ago

Yes

ggbifulco commented 1 year ago

Yes

How can i export it in onnx including NMS?

BloodAxe commented 1 year ago

This feature is in development now and probably will be available in the next release of Super Gradients. Technicaly the implementation include manual attaching of ONNX NMS layer to the model's graph.

BloodAxe commented 1 year ago

Export to ONNX with NMS is planned to be released in 3.2.0

https://github.com/Deci-AI/super-gradients/blob/master/documentation/source/models_export.md

samxu29 commented 3 months ago

Are there any further information on including nms layer while exporting to coreml format?

BloodAxe commented 3 months ago

No updates on this matter. But now you can export model with NMS as part of ONNX graph. So as long as CoreML support NMS from Onnnx opset - this should work end to end. One step that is on user side is onnx to coremml conversion

ggbifulco commented 3 months ago

@samxu29 I wrote the code to export the .pth model to .mlmodel and add the nms layer to coreml. You can find it below. I hope it will be helpful for future integrations @BloodAxe. For personal needs I wrote a torch wrapper of the model to get the coordinates in cxcywh format instead of xyxy:

import coremltools as ct
from super_gradients.common.object_names import Models
from super_gradients.training import models
import torch
import torchvision

class iOSDetectModel(torch.nn.Module):

    def __init__(self, model, im):
        """Initialize the iOSDetectModel class with a YOLO model and example image."""
        super().__init__()
        b, c, h, w = im.shape  # batch, channel, height, width
        self.model = model
        if w == h:
            self.normalize = 1.0 / w  # scalar
        else:
            self.normalize = torch.tensor([1.0 / w, 1.0 / h, 1.0 / w, 1.0 / h])  # broadcast (slower, smaller)

    def forward(self, x):
        """Normalize predictions of object detection model with input size-dependent factors."""
        pred = self.model(x)

        xyxy, cls = pred[0],pred[1]
        if len(xyxy)==2:
            xyxy,cls = pred[0][0],pred[0][1]

        new_boxes = torchvision.ops.box_convert(xyxy,"xyxy","cxcywh")
        return cls, new_boxes*self.normalize

Load model, convert in mlmodel format and save without NMS:

NUM_CLASSES = 8
IMGSZ = 1280
model = models.get(Models.YOLO_NAS_M,num_classes=NUM_CLASSES, checkpoint_path="PATH_TO_PTH_MODEL.pth")

im = torch.zeros(1,3, IMGSZ,IMGSZ)
modelIOS = iOSDetectModel(model, im)

ts = torch.jit.trace(modelIOS.eval(), im)  # TorchScript model
ct_model = ct.convert(ts,inputs=[ct.ImageType(shape=im.shape)])#,classifier_config=classifier_config)
ct_model.save("MODEL_WITHOUT_NMS.mlmodel")

Load pre-saved mlmodel file, add NMS and re-save the model:

model = ct.models.MLModel("./MODEL_WITHOUT_NMS.mlmodel")
names= {0:'CLS_0',1:'CLS_1',2:'CLS_2',3:'CLS_3',4:'CLS_4',5:'CLS_5',6:'CLS_6',7:'CLS_7'} # classes names

spec = model.get_spec()
out0, out1 = iter(spec.description.output)
#out0_shape, out1_shape = (8400, 8), (8400, 4)  # IF IMGSZ==640x640
out0_shape, out1_shape = (33600, 8), (33600, 4)   # IF IMGSZ==1280x1280

# Checks
nx, ny = spec.description.input[0].type.imageType.width, spec.description.input[0].type.imageType.height
na, nc = out0_shape

# Define output shapes (missing)
out0.type.multiArrayType.shape[:] = out0_shape  # (3780, 80)
out1.type.multiArrayType.shape[:] = out1_shape  # (3780, 4)

# Print
#print(spec.description)

# Model from spec
model = ct.models.MLModel(spec)

# 3. Create NMS protobuf
nms_spec = ct.proto.Model_pb2.Model()
nms_spec.specificationVersion = 5
for i in range(2):
    decoder_output = model._spec.description.output[i].SerializeToString()
    nms_spec.description.input.add()
    nms_spec.description.input[i].ParseFromString(decoder_output)
    nms_spec.description.output.add()
    nms_spec.description.output[i].ParseFromString(decoder_output)

nms_spec.description.output[0].name = 'confidence'
nms_spec.description.output[1].name = 'coordinates'

output_sizes = [nc, 4]
for i in range(2):
    ma_type = nms_spec.description.output[i].type.multiArrayType
    ma_type.shapeRange.sizeRanges.add()
    ma_type.shapeRange.sizeRanges[0].lowerBound = 0
    ma_type.shapeRange.sizeRanges[0].upperBound = -1
    ma_type.shapeRange.sizeRanges.add()
    ma_type.shapeRange.sizeRanges[1].lowerBound = output_sizes[i]
    ma_type.shapeRange.sizeRanges[1].upperBound = output_sizes[i]
    del ma_type.shape[:]

nms = nms_spec.nonMaximumSuppression
nms.confidenceInputFeatureName = out0.name  
nms.coordinatesInputFeatureName = out1.name 
nms.confidenceOutputFeatureName = 'confidence'
nms.coordinatesOutputFeatureName = 'coordinates'
nms.iouThresholdInputFeatureName = 'iouThreshold'
nms.confidenceThresholdInputFeatureName = 'confidenceThreshold'
nms.iouThreshold = 0.45
nms.confidenceThreshold = 0.25
nms.pickTop.perClass = True
nms.stringClassLabels.vector.extend(names.values())
nms_model = ct.models.MLModel(nms_spec)

#print(nms_spec.description)

# 4. Pipeline models together
pipeline = ct.models.pipeline.Pipeline(input_features=[('x_1', ct.models.datatypes.Array(3, ny, nx)),
                                                       ('iouThreshold', ct.models.datatypes.Double()),
                                                       ('confidenceThreshold', ct.models.datatypes.Double())],
                                       output_features=['confidence', 'coordinates'])
pipeline.add_model(model)
pipeline.add_model(nms_model)

# Correct datatypes
pipeline.spec.description.input[0].ParseFromString(model._spec.description.input[0].SerializeToString())
pipeline.spec.description.output[0].ParseFromString(nms_model._spec.description.output[0].SerializeToString())
pipeline.spec.description.output[1].ParseFromString(nms_model._spec.description.output[1].SerializeToString())

# Update metadata
pipeline.spec.specificationVersion = 5

# Save the model
model = ct.models.MLModel(pipeline.spec)
model.short_description = "YOLONAS_MODEL_DESCRIPTION"
model.input_description['x_1'] = 'Input image'
model.input_description['iouThreshold'] = f'(optional) IOU Threshold override (default: {nms.iouThreshold})'
model.input_description['confidenceThreshold'] = f'(optional) Confidence Threshold override (default: {nms.confidenceThreshold})'
model.output_description['confidence'] = 'Confidence'
model.output_description['coordinates'] = 'Boxes'
model.save("MODEL_WITH_NMS.mlmodel")

I hope it will be useful :)

samxu29 commented 3 months ago

Thank you for sharing! I am mainly trying to export the pose model, I assume it's the similar procedure for the nms on keypoints map, right?