Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.57k stars 501 forks source link

TensorRT doesn't show predictions YOLO-NAS-POSE #1859

Open Daanfb opened 8 months ago

Daanfb commented 8 months ago

💡 Your Question

I'm trying to do inference with a trt YOLO-NAS-POSE model. I have exported the model to onnx like it shows on the website:

export_result = yolo_nas_pose_s.export("yolo_nas_pose_s.onnx")

and then I got the .engine model with this command:

trtexec --explicitBatch --onnx=yolo_nas_pose_s.onnx --saveEngine=yolo_nas_pose_s_batch.engine

I tried the onnx model and it works fine, but with the .engine model I just get the image without the predictions.

This is my python code:

import tensorrt as trt
import numpy as np
import torch
from PIL import Image
import cv2
import time
from super_gradients.training.utils.visualization.pose_estimation import PoseVisualization
from collections import namedtuple

def iterate_over_batch_predictions(predictions, batch_size):
    num_detections, batch_boxes, batch_scores, batch_joints = predictions
    print(num_detections.shape, batch_boxes.shape, batch_scores.shape, batch_joints.shape)
    for image_index in range(batch_size):
        num_detection_in_image = num_detections[image_index, 0]
        pred_scores = batch_scores[image_index, :num_detection_in_image]
        pred_boxes = batch_boxes[image_index, :num_detection_in_image]
        pred_joints = batch_joints[image_index, :num_detection_in_image].reshape((len(pred_scores), -1, 3))

        yield image_index, pred_boxes, pred_scores, pred_joints

def get_predictions_from_batch_format(image, predictions):
    # In this tutorial we are using batch size of 1, therefore we are getting only first element of the predictions
    image_index, pred_boxes, pred_scores, pred_joints = next(iter(iterate_over_batch_predictions(predictions, 1)))

    image = PoseVisualization.draw_poses(
        image=image, poses=pred_joints, scores=pred_scores, boxes=pred_boxes,
        edge_links=None, edge_colors=None, keypoint_colors=None, is_crowd=None
    )

    return image

def load_engine(engine_file):
    with open(engine_file, "rb") as f, trt.Runtime(trt.Logger(trt.Logger.WARNING)) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

def inference(engine, image, device):

    image = cv2.resize(image, (640, 640))
    image = np.transpose(image, (2, 0, 1)).astype(np.uint8)
    image = np.expand_dims(image, axis=0)
    image = torch.from_numpy(image).to(device)

    Binding = namedtuple("Binding", ["data", "ptr"])
    bindings = {}

    start = time.perf_counter()
    with engine.create_execution_context() as context:

        ptrs = []
        for binding in engine:
            dtype = trt.nptype(engine.get_tensor_dtype(binding))
            shape = engine.get_binding_shape(binding)

            if engine.binding_is_input(binding):
                data = image
            else:
                data = torch.from_numpy(np.empty(shape, dtype=dtype)).to(device)

            # Memory address
            ptr = data.data_ptr()
            ptrs.append(ptr)

            bindings[binding] = Binding(data, ptr)
            ptrs = [binding.ptr for binding in bindings.values()]

        context.execute_v2(ptrs)

    predictions = []
    for binding in engine:
        if not engine.binding_is_input(binding):
            predictions.append(bindings[binding].data.cpu().numpy())

    exec_cost = time.perf_counter() - start
    print(f"Execution time: {exec_cost:.2f} s")

    return predictions

engine_file = "yolo_nas_pose_s_batch.engine"
device = "cuda"

engine = load_engine(engine_file)

image_path = "image.jpg"
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

predictions = inference(engine, image, device)

img = get_predictions_from_batch_format(image, predictions)
img = Image.fromarray(img, mode='RGB')
img.save('myimage.png')

Versions

Collecting environment information... PyTorch version: 2.2.0+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise LTSC GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.17763-SP0 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650 Nvidia driver version: 551.52 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\cudnn_ops_train64_8.dll HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=1992 DeviceID=CPU0 Family=198 L2CacheSize=2048 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=1992 Name=Intel(R) Core(TM) i7-10700TE CPU @ 2.00GHz ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.23.0 [pip3] onnx==1.13.0 [pip3] onnx-graphsurgeon==0.3.12 [pip3] onnxruntime==1.13.1 [pip3] onnxsim==0.4.35 [pip3] super-gradients==3.6.0 [pip3] torch==2.2.0+cu118 [pip3] torchaudio==2.2.0+cu118 [pip3] torchmetrics==0.8.0 [pip3] torchvision==0.17.0 [conda] Could not collect

BloodAxe commented 8 months ago

What TRT engine are you using?

Daanfb commented 8 months ago

What TRT engine are you using?

My TensorRT version is 8.6.1. I have updated the post the the correct .engine and .onnx file names that I put wrong

Daanfb commented 8 months ago

@BloodAxe When I exported the model the confidence_threshold=0.05. I have just printed out the batch scores and I get 0, so maybe it could be the answer to my question. If I'm right, what should I do to get correct predictions?

def iterate_over_batch_predictions(predictions, batch_size):
    num_detections, batch_boxes, batch_scores, batch_joints = predictions

    print("Batch_scores with confidence greater than 0.05: ", batch_scores[batch_scores > 0.05])
    for image_index in range(batch_size):
        num_detection_in_image = num_detections[image_index, 0]
        pred_scores = batch_scores[image_index, :num_detection_in_image]
        pred_boxes = batch_boxes[image_index, :num_detection_in_image]
        pred_joints = batch_joints[image_index, :num_detection_in_image].reshape((len(pred_scores), -1, 3))

        yield image_index, pred_boxes, pred_scores, pred_joints
Daanfb commented 8 months ago

@BloodAxe When I exported the model the confidence_threshold=0.05. I have just printed out the batch scores and I get 0, so maybe it could be the answer to my question. If I'm right, what should I do to get correct predictions?

def iterate_over_batch_predictions(predictions, batch_size):
    num_detections, batch_boxes, batch_scores, batch_joints = predictions

    print("Batch_scores with confidence greater than 0.05: ", batch_scores[batch_scores > 0.05])
    for image_index in range(batch_size):
        num_detection_in_image = num_detections[image_index, 0]
        pred_scores = batch_scores[image_index, :num_detection_in_image]
        pred_boxes = batch_boxes[image_index, :num_detection_in_image]
        pred_joints = batch_joints[image_index, :num_detection_in_image].reshape((len(pred_scores), -1, 3))

        yield image_index, pred_boxes, pred_scores, pred_joints

I was wrong. That's not the problem because I have exported with the flag --fp16 and I get predictions with a score greater than 0.05 but the image I get doesn't show the predictions.

A thing that concerns me is that when I run the code I get this one:

```[TRT] [E] 1: [softMaxV2Runner.cpp::nvinfer1::rt::task::CaskSoftMaxV2Runner::execute::226] Error Code 1: Cask (shader run failed)````

I don't know what that means that because I'm new on this, but I think that maybe cause some errors.

By the way, sometimes the code fails because of this error:

AttributeError: 'int' object has no attribute 'sqrt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\E2K6\Desktop\DANIEL\YOLO-NAS-POSE\aa.py", line 94, in <module>
    img = get_predictions_from_batch_format(image, predictions)
  File "C:\Users\E2K6\Desktop\DANIEL\YOLO-NAS-POSE\aa.py", line 29, in get_predictions_from_batch_format
    image = PoseVisualization.draw_poses(
  File "c:\Users\E2K6\AppData\Local\Programs\Python\Python38\lib\site-packages\super_gradients\training\utils\visualization\pose_estimation.py", line 181, in draw_poses
    current_box_thickness = box_thickness or get_recommended_box_thickness(x1, y1, x2, y2)
  File "c:\Users\E2K6\AppData\Local\Programs\Python\Python38\lib\site-packages\super_gradients\training\utils\visualization\detection.py", line 52, in get_recommended_box_thickness
    diag_length = np.sqrt(bbox_width**2 + bbox_height**2)
TypeError: loop of ufunc does not support argument 0 of type int which has no callable sqrt method
BloodAxe commented 8 months ago

Thanks for this detailed analysis of the issue and some outputs. We will certainly look into it