Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

custom dataset onnx error! #1108

Closed Yakiw closed 7 months ago

Yakiw commented 1 year ago

💡 Your Question

Wierdly! when i use default coco dataset and convert it into ONNX, after inference i will get right detect result but when i use custom dataset , convert yolo-nas train weight "pth" into ONNX, after inference i can't get right detect result . The conf is very very low

when I use offficial code to run my custom dataset .pth file ,i can get True results my code: f``` rom super_gradients.training import models from super_gradients.common.object_names import Models import cv2 import numpy as np img=cv2.imread('/bus.jpg') net = models.get(Models.YOLO_NAS_S, num_classes= 6, checkpoint_path='//runs/train0/ckpt_best.pth') net.predict(img).show()

models.convert_to_onnx(model=net, input_shape=(3,320, 320), out_path='yolo-nas-s.onnx')


after this , I convert pth to onnx, and run onnx model , I get low confience boxes  and wrong results!!!!
but when I use pretrained coco.pth and convert it to onnx, i get True results

### Versions

\
NatanBagrov commented 1 year ago

Hello, Please, reformat your description and rephrase the question. WDYM "I use COCO dataset and convert it to ONNX"?

mmax3 commented 1 year ago

I noticed that too. The thing is when the model is trained on custom dataset and then converted to ONNX and the inference is done with e.g. onnx-runtime the resulting confidences are too small numbers (0.00xyz) and do not match even approximately the unconverted .pth model. I even noticed that the bounding boxes from the ONNX model are negative numbers. Is there some description of the output from the converted ONNX model ?

Yakiw commented 1 year ago

It is indeed what you said. When I use pretrain coco.pth to convert into onnx then run my code, there is no problem at all, but the custom training model is converted to onnx, and the confidence level will be particularly low. I experimented a lot but didn't figure out what could cause this

NatanBagrov commented 1 year ago

To clarify, Torch model given correct predictions, but conversion to ONNX makes them garbage?

Yakiw commented 1 year ago

To clarify, Torch model given correct predictions, but conversion to ONNX makes them garbage?

Yes!! Using a custom training model to make predictions through the following code can get completely correct results, but converting to onnx does not work.

from super_gradients.training import models from super_gradients.common.object_names import Models import cv2 import numpy as np img=cv2.imread('/bus.jpg') net = models.get(Models.YOLO_NAS_S, num_classes= 6, checkpoint_path='//runs/train0/ckpt_best.pth') net.predict(img).show()

NatanBagrov commented 1 year ago

Thanks for the info, we'll dig into it

@BloodAxe , @Louis-Dupont

Yakiw commented 1 year ago

Thanks for the info, we'll dig into it

@BloodAxe , @Louis-Dupont

Thank you very much for your attention to this issue, looking forward to hearing from you!

Hyuto commented 1 year ago

Hello, this is the issue from my repo Hyuto/yolo-nas-onnx to inference YOLO-NAS ONNX model. I found the main problem is the preprocessing steps @Uaena-Wang custom trained model isn't the same as YOLO-NAS pretrained-coco model.

image

As you from the image the preprocessing steps is different so it affect the detection result. Revering to the source code from https://github.com/Deci-AI/super-gradients/blob/1c67ce369b53daf162a91faacb24f3501e7b84ad/src/super_gradients/training/processing/processing.py#L262 the custom model preprocessing steps is using yolox_coco not yolo_nas.

I never trained custom model before so I didn't know why this is happen. Is the preprocessing step for YOLO-NAS custom trained model should be the same as the original?

rose-jinyang commented 1 year ago

Hi @Hyuto You said that there is an issue in the preprocessing for custom model. In fact, I also trained my custom model and the torch model works well with PyTorch. But the converted ONNX model dose NOT work correctly. All the predicted scores are very very low. When is the issue preprocessing used? in training step? Otherwise, other inference step?

haritsahm commented 1 year ago

I had the same problem, different onnx output result from yolo_nas pretrained coco and yolo_nas trained on custom dataset. @NatanBagrov @BloodAxe

net = models.get(Models.YOLO_NAS_S, pretrained_weights="coco") -> tested with street picture
custom_net = models.get(Models.YOLO_NAS_S, checkpoint_path=best_ckpt) -> tested with picture from dataset

## net prediction
`predict()` function -> both models produced good results

## export the model to onnx

## inference on the same test image
deetection_threshold -> 0.01
torchvision.batched_nms -> topk 100
print top scores

### coco net scores, idx
0.93625677 0.0
0.7960305 2.0
0.7478685 0.0
0.49807656 0.0

### custom net scores, idx
0.02726981 37.0
0.026824027 37.0
0.025673926 37.0
0.023643166 37.0
0.0234887 37.0
haritsahm commented 1 year ago

Quick update on this one, I finally got the correct result by NOT normalizing the image to [0, 1] for my custom model. On the contrary, by not normalizing the image will produce poor results on the pre-training coco model. Isn't this odd?

I had the same problem, different onnx output result from yolo_nas pretrained coco and yolo_nas trained on custom dataset. @NatanBagrov @BloodAxe

net = models.get(Models.YOLO_NAS_S, pretrained_weights="coco") -> tested with street picture
custom_net = models.get(Models.YOLO_NAS_S, checkpoint_path=best_ckpt) -> tested with picture from dataset

## net prediction
`predict()` function -> both models produced good results

## export the model to onnx

## inference on the same test image
deetection_threshold -> 0.01
torchvision.batched_nms -> topk 100
print top scores

### coco net scores, idx
0.93625677 0.0
0.7960305 2.0
0.7478685 0.0
0.49807656 0.0

### custom net scores, idx
0.02726981 37.0
0.026824027 37.0
0.025673926 37.0
0.023643166 37.0
0.0234887 37.0
Yakiw commented 1 year ago

Quick update on this one, I finally got the correct result by NOT normalizing the image to [0, 1] for my custom model. On the contrary, by not normalizing the image will produce poor results on the pre-training coco model. Isn't this odd?

I had the same problem, different onnx output result from yolo_nas pretrained coco and yolo_nas trained on custom dataset. @NatanBagrov @BloodAxe

net = models.get(Models.YOLO_NAS_S, pretrained_weights="coco") -> tested with street picture
custom_net = models.get(Models.YOLO_NAS_S, checkpoint_path=best_ckpt) -> tested with picture from dataset

## net prediction
`predict()` function -> both models produced good results

## export the model to onnx

## inference on the same test image
deetection_threshold -> 0.01
torchvision.batched_nms -> topk 100
print top scores

### coco net scores, idx
0.93625677 0.0
0.7960305 2.0
0.7478685 0.0
0.49807656 0.0

### custom net scores, idx
0.02726981 37.0
0.026824027 37.0
0.025673926 37.0
0.023643166 37.0
0.0234887 37.0

Thank you for your method. I will try it out! If possible, could you share your ONNX inference code😃

haritsahm commented 1 year ago

Quick update on this one, I finally got the correct result by NOT normalizing the image to [0, 1] for my custom model. On the contrary, by not normalizing the image will produce poor results on the pre-training coco model. Isn't this odd?

I had the same problem, different onnx output result from yolo_nas pretrained coco and yolo_nas trained on custom dataset. @NatanBagrov @BloodAxe

net = models.get(Models.YOLO_NAS_S, pretrained_weights="coco") -> tested with street picture
custom_net = models.get(Models.YOLO_NAS_S, checkpoint_path=best_ckpt) -> tested with picture from dataset

## net prediction
`predict()` function -> both models produced good results

## export the model to onnx

## inference on the same test image
deetection_threshold -> 0.01
torchvision.batched_nms -> topk 100
print top scores

### coco net scores, idx
0.93625677 0.0
0.7960305 2.0
0.7478685 0.0
0.49807656 0.0

### custom net scores, idx
0.02726981 37.0
0.026824027 37.0
0.025673926 37.0
0.023643166 37.0
0.0234887 37.0

Thank you for your method. I will try it out! If possible, could you share your ONNX inference codesmiley

Just a simple process,

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess
haritsahm commented 1 year ago

Hello, this is the issue from my repo Hyuto/yolo-nas-onnx to inference YOLO-NAS ONNX model. I found the main problem is the preprocessing steps @Uaena-Wang custom trained model isn't the same as YOLO-NAS pretrained-coco model.

image

As you from the image the preprocessing steps is different so it affect the detection result. Revering to the source code from

https://github.com/Deci-AI/super-gradients/blob/1c67ce369b53daf162a91faacb24f3501e7b84ad/src/super_gradients/training/processing/processing.py#L262

the custom model preprocessing steps is using yolox_coco not yolo_nas. I never trained custom model before so I didn't know why this is happen. Is the preprocessing step for YOLO-NAS custom trained model should be the same as the original?

I think this is related to the model result. If you check the recipes/dataset_params/, I trained the model using coco_detection_yolo_format_base_dataset_params.yaml which doesn't have DetecttionStandarize in the augmentation. If you check roboflow_detection_dataset_params.yaml or coco_detection_yolo_nas_dataset_params.yaml, they have DetectionStandarize in the augmentation. Can you confirm that this is on purpose or this is another copy/paste error? @NatanBagrov @BloodAxe

Yakiw commented 1 year ago

Quick update on this one, I finally got the correct result by NOT normalizing the image to [0, 1] for my custom model. On the contrary, by not normalizing the image will produce poor results on the pre-training coco model. Isn't this odd?

I had the same problem, different onnx output result from yolo_nas pretrained coco and yolo_nas trained on custom dataset. @NatanBagrov @BloodAxe

net = models.get(Models.YOLO_NAS_S, pretrained_weights="coco") -> tested with street picture
custom_net = models.get(Models.YOLO_NAS_S, checkpoint_path=best_ckpt) -> tested with picture from dataset

## net prediction
`predict()` function -> both models produced good results

## export the model to onnx

## inference on the same test image
deetection_threshold -> 0.01
torchvision.batched_nms -> topk 100
print top scores

### coco net scores, idx
0.93625677 0.0
0.7960305 2.0
0.7478685 0.0
0.49807656 0.0

### custom net scores, idx
0.02726981 37.0
0.026824027 37.0
0.025673926 37.0
0.023643166 37.0
0.0234887 37.0

Thank you for your method. I will try it out! If possible, could you share your ONNX inference codesmiley

Just a simple process,

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess

Thanks for your code, I can get the correct confidence! But the position analysis of the box in my original code is not right, if it is possible, can you share your Postprocess code? thank u again.

haritsahm commented 1 year ago

@Uaena-Wang Sorry for the late response, here's is the full code;


import cv2

import onnxruntime as ort
import numpy as np
import torchvision
import torch

import matplotlib.pyplot as plt
%matplotlib inline

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset dut to no image normalizaation
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess
output_data = output[0]
indices_max = output_data[:, :, 4] > 0.25
output_data = output_data[:, indices_max.ravel(), :]
boxes, scores, class_idx = output_data[0, :, :4], output_data[0, :, 4], output_data[0, :, 5]

indices = torchvision.ops.batched_nms(torch.from_numpy(boxes), torch.from_numpy(scores), torch.from_numpy(class_idx), 0.7)
# np.sort(output_data, axis=2) 
boxes = boxes[indices][:100]
scores = scores[indices][:100]
class_idx = class_idx[indices][:100]
for box, score, idx in zip(boxes, scores, class_idx):
    x1, y1, x2, y2 = box.astype(np.int32)
    cv2.rectangle(image,(x1,y1),(x2,y2),(0,255,0),2)

plt.imshow(image)
ukoehler commented 1 year ago

Same problem here. @haritsahm did help a lot, However, the boxes are slightly shifted. At least it does prove that the basic information is available.

Now, the question is: Is it better to adapt the post-processing or trying to get the training right (with scaled images)? How would I achieve the latter?

Btw, I am using OpenCV DNN for inference if anybody is interested, I can provide the code for that.

haritsahm commented 1 year ago

I suggest you adapt the post-processing to validate your model/application. Then, you can retrain the model again with the correct augmentation pipeline. Note that the result might be different from the original model without image scaling

ukoehler commented 1 year ago

Hmmm, I followed a tutorial to trained starting from pre-trained data. The question is: when the input images are scaled in an unsuitable way for the pre-trained dataset (YOLO-NAS), is there any point in starting from a pre-trained model rather than completely random numbers? The results I get now are really good for few images and only 50 epochs of training.

I will have to look into manipulating the augmentation for futures trainings. It is just very counter intuitive for the user when using

  train_data = coco_detection_yolo_format_train(
      dataset_params={
          'data_dir': dataset_params['data_dir'],
          'images_dir': dataset_params['train_images_dir'],
          'labels_dir': dataset_params['train_labels_dir'],
          'classes': dataset_params['classes']
      },
      dataloader_params={
          'batch_size':BATCH_SIZE,
          'num_workers':WORKERS
      }
  )
print(train_data.dataset.transforms)

to create the default augmentation for COCO data and get different transformation (and in a very significant way) than the one used for the pre-trained models.

Is there any documentation or example how the adapt the transformations?

iAlokMishra commented 1 year ago

@Uaena-Wang Sorry for the late response, here's is the full code;


import cv2

import onnxruntime as ort
import numpy as np
import torchvision
import torch

import matplotlib.pyplot as plt
%matplotlib inline

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset dut to no image normalizaation
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess
output_data = output[0]
indices_max = output_data[:, :, 4] > 0.25
output_data = output_data[:, indices_max.ravel(), :]
boxes, scores, class_idx = output_data[0, :, :4], output_data[0, :, 4], output_data[0, :, 5]

indices = torchvision.ops.batched_nms(torch.from_numpy(boxes), torch.from_numpy(scores), torch.from_numpy(class_idx), 0.7)
# np.sort(output_data, axis=2) 
boxes = boxes[indices][:100]
scores = scores[indices][:100]
class_idx = class_idx[indices][:100]
for box, score, idx in zip(boxes, scores, class_idx):
    x1, y1, x2, y2 = box.astype(np.int32)
    cv2.rectangle(image,(x1,y1),(x2,y2),(0,255,0),2)

plt.imshow(image)

@haritsahm @NatanBagrov I am also facing the issue of performance loss after ONNX export from custom trained model.

I trained Yolo-NAS-S model on custom dataset by referring this tutorial (https://colab.research.google.com/drive/1yHrHkUR1X2u2FjjvNMfUbSXTkUul6o1P?usp=sharing#scrollTo=fBZenZegj9hm) The pytorch model prediction is good, but when I am converting the pytorch checkpoint to ONNX then detection performance is dropped. There are two issues:

  1. Fewer objects are detected.
  2. The confidence score of prediction is comparatively low.

I tried to use above inference script with my exported ONNX but it does not work with my model.

In this script there are all outputs at output[0] node, but in my model I am getting two outputs: output[0] -> 1x8400x4 dimension output[1] -> 1x8400x[Number of classes] dimension.

Any ideas why I am getting outputs at separate nodes and it is at single output node in above script. Also, what may be reason of reduced performance with ONNX.

If possible, please share ONNX conversion / inference code which worked for you.

ukoehler commented 1 year ago

Hi @iAlokMishra,

the poor performance is due to image normalization (or not). See @haritsahm solution/work-around.

iAlokMishra commented 1 year ago

https://github.com/Hyuto/yolo-nas-onnx helped me to get the performance with ONNX same as pytorch model. My model required images in BGR format so I needed to change for it.

Phyrokar commented 1 year ago

@Uaena-Wang Sorry for the late response, here's is the full code;


import cv2

import onnxruntime as ort
import numpy as np
import torchvision
import torch

import matplotlib.pyplot as plt
%matplotlib inline

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset dut to no image normalizaation
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess
output_data = output[0]
indices_max = output_data[:, :, 4] > 0.25
output_data = output_data[:, indices_max.ravel(), :]
boxes, scores, class_idx = output_data[0, :, :4], output_data[0, :, 4], output_data[0, :, 5]

indices = torchvision.ops.batched_nms(torch.from_numpy(boxes), torch.from_numpy(scores), torch.from_numpy(class_idx), 0.7)
# np.sort(output_data, axis=2) 
boxes = boxes[indices][:100]
scores = scores[indices][:100]
class_idx = class_idx[indices][:100]
for box, score, idx in zip(boxes, scores, class_idx):
    x1, y1, x2, y2 = box.astype(np.int32)
    cv2.rectangle(image,(x1,y1),(x2,y2),(0,255,0),2)

plt.imshow(image)

@haritsahm @NatanBagrov I am also facing the issue of performance loss after ONNX export from custom trained model.

I trained Yolo-NAS-S model on custom dataset by referring this tutorial (https://colab.research.google.com/drive/1yHrHkUR1X2u2FjjvNMfUbSXTkUul6o1P?usp=sharing#scrollTo=fBZenZegj9hm) The pytorch model prediction is good, but when I am converting the pytorch checkpoint to ONNX then detection performance is dropped. There are two issues:

1. Fewer objects are detected.

2. The confidence score of prediction is comparatively low.

I tried to use above inference script with my exported ONNX but it does not work with my model.

In this script there are all outputs at output[0] node, but in my model I am getting two outputs: output[0] -> 1x8400x4 dimension output[1] -> 1x8400x[Number of classes] dimension.

Any ideas why I am getting outputs at separate nodes and it is at single output node in above script. Also, what may be reason of reduced performance with ONNX.

If possible, please share ONNX conversion / inference code which worked for you.

Hello, I have the same issue. Any ideas?

PrajwalCogniac commented 1 year ago

Same problem here. @haritsahm did help a lot, However, the boxes are slightly shifted. At least it does prove that the basic information is available.

Now, the question is: Is it better to adapt the post-processing or trying to get the training right (with scaled images)? How would I achieve the latter?

Btw, I am using OpenCV DNN for inference if anybody is interested, I can provide the code for that.

Hey @ukoehler can you please share the ONNX conversion and inference code please ! @haritsahm can you please share ONNX conversion and inference code as well ! As mentioned by @iAlokMishra and @Phyrokar I am facing the same issue i.e In this script there are all outputs at output[0] node, but in my model I am getting two outputs: output[0] -> 1x8400x4 dimension output[1] -> 1x8400x[Number of classes] dimension Any idea or help in this regard please ?

ukoehler commented 1 year ago

The first output has the bounding box coordinates and the second the probabilities per class.

PrajwalCogniac commented 1 year ago

thanks a lot for the reply @ukoehler. I am fairly new to this ... Can you please share the inference code you had ? It will be really helpful !

kamleshiitb commented 1 year ago

@Uaena-Wang Sorry for the late response, here's is the full code;


import cv2

import onnxruntime as ort
import numpy as np
import torchvision
import torch

import matplotlib.pyplot as plt
%matplotlib inline

onnx_model_path = 'model.onnx'
image_path = 'path_to_image.jpg'

img = cv2.imread(image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(640, 640))
# img = img / 255.0 -> poor result on custom dataset dut to no image normalizaation
img = np.transpose(img, (2, 0, 1))
img = np.expand_dims(img, axis=0).astype(dtype=np.float32)

sess = ort.InferenceSession(onnx_model_path)

# Get the input and output names of the model
input_name = sess.get_inputs()[0].name
output_names = [out.name for out in sess.get_outputs()]
# Run inference
output = sess.run(output_names, {input_name: img})

# Postprocess
output_data = output[0]
indices_max = output_data[:, :, 4] > 0.25
output_data = output_data[:, indices_max.ravel(), :]
boxes, scores, class_idx = output_data[0, :, :4], output_data[0, :, 4], output_data[0, :, 5]

indices = torchvision.ops.batched_nms(torch.from_numpy(boxes), torch.from_numpy(scores), torch.from_numpy(class_idx), 0.7)
# np.sort(output_data, axis=2) 
boxes = boxes[indices][:100]
scores = scores[indices][:100]
class_idx = class_idx[indices][:100]
for box, score, idx in zip(boxes, scores, class_idx):
    x1, y1, x2, y2 = box.astype(np.int32)
    cv2.rectangle(image,(x1,y1),(x2,y2),(0,255,0),2)

plt.imshow(image)

@haritsahm @NatanBagrov I am also facing the issue of performance loss after ONNX export from custom trained model. I trained Yolo-NAS-S model on custom dataset by referring this tutorial (https://colab.research.google.com/drive/1yHrHkUR1X2u2FjjvNMfUbSXTkUul6o1P?usp=sharing#scrollTo=fBZenZegj9hm) The pytorch model prediction is good, but when I am converting the pytorch checkpoint to ONNX then detection performance is dropped. There are two issues:

1. Fewer objects are detected.

2. The confidence score of prediction is comparatively low.

I tried to use above inference script with my exported ONNX but it does not work with my model. In this script there are all outputs at output[0] node, but in my model I am getting two outputs: output[0] -> 1x8400x4 dimension output[1] -> 1x8400x[Number of classes] dimension. Any ideas why I am getting outputs at separate nodes and it is at single output node in above script. Also, what may be reason of reduced performance with ONNX. If possible, please share ONNX conversion / inference code which worked for you.

Hello, I have the same issue. Any ideas?

I think you can try PIL image to inference (it worked for me) from PIL import Image import cv2 import torchvision.transforms as transforms test_transform = transforms.Compose([ transforms.Resize((640,640)), transforms.ToTensor()])

image = test_transform(Image.open(image_path)).unsqueeze(0) input_data = np.array(image, dtype=np.float32)

BloodAxe commented 7 months ago

For those who find this issue via google search please use this link to model export example that shows inference of the ONNX model:

https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/examples/model_export/models_export.ipynb

For inference using TRT there is second notebook demo: https://github.com/Deci-AI/super-gradients/blob/master/notebooks/YoloNAS_Inference_using_TensorRT.ipynb