SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
Apache License 2.0
486 stars 118 forks source link

convert scrfd onnx model to tensorrt #83

Open saeedkhanehgir opened 2 years ago

saeedkhanehgir commented 2 years ago

Hi Thanks for sharing this project.

I download scrfd model from link and try to convert it to tensorrt model with /src/converters/modules/converters/onnx_to_trt.py script.

I use a custom convert.py script for this work.

convert.py.txt

I get the below error.

[04/28/2022-13:10:23] [TRT] [E] 4: [network.cpp::validate::3011] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
Traceback (most recent call last):
  File "convert.py", line 9, in <module>
    convert_onnx(onnx_path,trt_path)
  File "/home/saeed.khanehgir/InsightFace-REST/src/converters/modules/converters/onnx_to_trt.py", line 83, in convert_onnx
    assert not isinstance(engine, type(None))
AssertionError

Thanks

SthPhoenix commented 2 years ago

Hi! src/converters are outdated. You can try checking src/api_trt/modules/model_zoo/getter.py lines 146-163 for you desired use-case.

Something like this should work:

import onnx
from ..converters.onnx_to_trt import convert_onnx
from ..converters.reshape_onnx import reshape, reshape_onnx_input

onnx_path='scrfd_10g_gnkps.onnx'
trt_path='scrfd_10g_gnkps.plan'

model = onnx.load(onnx_path)
onnx_batch_size = 1
height, width = [640,640]
force_fp16=True

reshaped = reshape(model, n=onnx_batch_size, h=height, w=width)
temp_onnx_model = reshaped.SerializeToString()

convert_onnx(temp_onnx_model,
             engine_file_path=trt_path,
             max_batch_size=max_batch_size,
             force_fp16=force_fp16)

(imports are relative to src/api_trt/modules/model_zoo)

saeedkhanehgir commented 2 years ago

Thanks @SthPhoenix for the face embedding model, I use the below code to convert to the fp16 .plan model.

import numpy as np
import onnx
from modules.converters.onnx_to_trt import convert_onnx
from modules.converters.reshape_onnx import  reshape, reshape_onnx_input
# from ..converters.onnx_to_trt import convert_onnx
# from ..converters.reshape_onnx import reshape, reshape_onnx_input
onnx_path = 'w600k_r50.onnx'
trt_path = 'w600k_r50.plan'
model = onnx.load(onnx_path)
onnx_batch_size = 1
reshaped = reshape(model, n=onnx_batch_size, h=112, w=112)
temp_onnx_model = reshaped.SerializeToString()
convert_onnx(temp_onnx_model,
                         engine_file_path=trt_path,
                         max_batch_size=1,
                         force_fp16=True)

and use the below code for inference.

import engine as eng
import tensorrt as trt 
import inference as inf
import cv2 
from PIL import Image
import numpy as np
import skimage.transform
import os 
import numpy as np 
import pycuda.driver as cuda
import pycuda.autoinit

onnx_path='w600k_r50.onnx'
serialized_plan_fp16='w600k_r50.plan'
input_file_path ='2.jpg'
HEIGHT=112
WIDTH=112

def rescale_image(image, output_shape, order=1):
   image = skimage.transform.resize(image, output_shape,
               order=order, preserve_range=True, mode='reflect')
   return image

def l2_normalize(x):
    return x / np.sqrt(np.sum(np.multiply(x, x)))

def load_engine(trt_runtime, plan_path):
   with open(plan_path, 'rb') as f:
       engine_data = f.read()
   engine = trt_runtime.deserialize_cuda_engine(engine_data)
   return engine

def load_images_to_buffer(pics, pagelocked_buffer):

   preprocessed = np.asarray(pics).ravel()
   np.copyto(pagelocked_buffer, preprocessed)

def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size):

   """
   This is the function to run the inference
   Args:
      engine : Path to the TensorRT engine. 
      pics_1 : Input images to the model.  
      h_input_1: Input in the host. 
      d_input_1: Input in the device. 
      h_output_1: Output in the host. 
      d_output_1: Output in the device. 
      stream: CUDA stream.
      batch_size : Batch size for execution time.
      height: Height of the output image.
      width: Width of the output image.

   Output:
      The list of output images.

   """

   load_images_to_buffer(pics_1, h_input_1)

   with engine.create_execution_context() as context:
       # Transfer input data to the GPU.
       cuda.memcpy_htod_async(d_input_1, h_input_1, stream)

       # Run inference.

       context.profiler = trt.Profiler()
       context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])

       # Transfer predictions back from the GPU.
       cuda.memcpy_dtoh_async(h_output, d_output, stream)
       # Synchronize the stream.
       stream.synchronize()
       # Return the host output.
       # out = h_output
       out = h_output.reshape((batch_size,-1))
       return out 

input_file_path='face.jpg'
image = np.asarray(Image.open(input_file_path))
image = rescale_image(image, (112, 112),order=1)
im = np.array(image, dtype=np.float32, order='C')

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)

engine = load_engine(trt_runtime, serialized_plan_fp16)
h_input, d_input, h_output, d_output, stream = inf.allocate_buffers(engine, 1, trt.float32)
out = do_inference(engine, im, h_input, d_input, h_output, d_output, stream, 1)
print('embedding',out[0])

I used this code for face verification but got a bad result. Do you see something wrong?

SthPhoenix commented 2 years ago

Hi! First of all face image must be properly aligned 112x112 image from detection step, you can't just take arbitrary image containing face and resize it to 112x112. Secondly you have no image preprocessing required for inference, in case with w600k model it should be something like this:

img  = cv2.imread("face.jpg", cv2.IMREAD_COLOR)
imgs = [img]

input_size =  (112,112)
input_std = 127.5
input_mean = 127.5
blob = cv2.dnn.blobFromImages(imgs, 1.0 /input_std, input_size,
                                      (input_mean, input_mean, input_mean), swapRB=True)

...

out = do_inference(engine, blob, h_input, d_input, h_output, d_output, stream, 1)
saeedkhanehgir commented 2 years ago

Thanks @SthPhoenix Solved.

Aniket-rohara commented 2 months ago

hey @saeedkhanehgir @SthPhoenix i am also working on a similar project where i need to convert my onnx model to trt and then do the inference so i was looking at the inference code above given by you, there you have imported inference as inf and engine as eng so can you tell me where are those files as i guess the buffer allocation and synchronisation is happening in one of them and that's the part where i am also facing the issue....hope to hear from you soon.... :)