Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4
Apache License 2.0
4.46k stars 1.49k forks source link

int8 TensorRT #503

Open Timorleiderman opened 2 years ago

Timorleiderman commented 2 years ago

I tried to convert the YoloV4 model to int8 with TensorRT 7.2.3.4 and I'm getting really bad results. The int8 engine is really fast indeed but the accuracy is really bad.

Anybody succeed to convert the model to int8 and got good results?

romil611 commented 2 years ago

@Timorleiderman Hi, can you please share how you converted it to int8 engine? When I tried to do so, I got roughly the same inference time as the pytorch code but as the accuracy went down.

Timorleiderman commented 2 years ago

Thisi is my calibrator.py with imagestream

import os
import cv2
import numpy as np
import tensorrt as trt
import pycuda.autoinit  #
import pycuda.driver as cuda

class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator2):
    def __init__(self, input_layers, stream, cache_file):
        trt.IInt8EntropyCalibrator2.__init__(self)
        self.input_layers = input_layers
        self.stream = stream
        self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes)
        stream.reset()
        self.cache_file = cache_file
        self.current_index = 0
        self.batch_size = self.stream.batch_size

    def get_batch_size(self):
        return self.stream.batch_size

    def get_batch(self, names):
        try:
            batch = self.stream.next_batch()
            if not batch.size:
                return None

            current_batch = int(self.current_index / self.stream.batch_size)
            # if current_batch % 10 == 0:
            print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))

            cuda.memcpy_htod(self.d_input, batch)
            for i in self.input_layers[0]:
                assert names[0] != i

            # bindings = int(self.d_input)
            self.current_index += self.batch_size

            return [self.d_input]
        except StopIteration:
            # When we're out of batches, we return either [] or None.
            # This signals to TensorRT that there is no calibration data remaining.
            print("Stop Iteration out of batches")
            return None

    def read_calibration_cache(self):
        # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
        if os.path.exists(self.cache_file):
            with open(self.cache_file, "rb") as f:
                print("using cache file ...", self.cache_file)
                return f.read()

    def write_calibration_cache(self, cache):
        print("write cache file ...", self.cache_file)
        with open(self.cache_file, 'wb') as f:
            f.write(cache)

class ImageBatchStream(object):
    def __init__(self, batch_size, calibration_files, channels=3, width=608, height=608):
        self.batch_size = batch_size
        self.max_batches = (len(calibration_files) // batch_size) + (1 if (len(calibration_files) % batch_size) else 0)
        self.files = calibration_files
        self.c = channels
        self.w = width
        self.h = height
        self.calibration_data = np.zeros((batch_size, self.c, self.h, self.w), dtype=np.float32)
        self.batch = 0    
        print("[ImageBatchStream] init ", len(self.files), " images.. with bach size ", self.batch_size)    

    @staticmethod
    def read_image_chw(path, w, h):
        print("reading .. ", path)
        img_left_cv = cv2.imread(path)
        img_left_cv_res = cv2.resize(img_left_cv, [w, h]) 
        img_left_cv_res = cv2.cvtColor(img_left_cv_res, cv2.COLOR_BGR2RGB)
        img_left_cv_res_tr = np.transpose(img_left_cv_res, (2, 0, 1)).astype(np.float32)
        img_left_cv_res_tr = np.expand_dims(img_left_cv_res_tr, axis=0)
        img_left_cv_res_tr_f32 = img_left_cv_res_tr / 255.0
        return img_left_cv_res_tr_f32

    def reset(self):
        self.batch = 0

    def next_batch(self):
        if self.batch < self.max_batches:
            imgs = []
            files_for_batch = self.files[self.batch_size * self.batch: self.batch_size * (self.batch + 1)]
            for f in files_for_batch:
                print("[ImageBatchStream] Processing ", f)
                img = ImageBatchStream.read_image_chw(f, self.w, self.h)
                imgs.append(img)
            for i in range(len(imgs)):
                self.calibration_data[i] = imgs[i]
            self.batch += 1
            return np.ascontiguousarray(self.calibration_data, dtype=np.float32)
        else:
            return np.array([])

building the engine with this script


def build_int8_engine(model_onnx_file, calib, logger, max_batch_size=1, worspace_size=1):
    with trt.Builder(logger) as builder,\
            builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network,\
            trt.OnnxParser(network, logger) as parser:

        builder.max_batch_size = max_batch_size
        builder.max_workspace_size = GiB(worspace_size)
        builder.int8_mode = True
        builder.fp16_mode = True
        builder.int8_calibrator = calib
        # builder.min_find_iterations = 100
        # Parsing ONNX file...
        with open(model_onnx_file, 'rb') as model:
            model_t = parser.parse(model.read())
            if not model_t:
                for error in range(parser.num_errors):
                    print(parser.get_error(error))

        # Build engine and do int8 calibration.
        return builder.build_cuda_engine(network)          

and this is my script for running it create_calibration_dataset2 is a function for loading a list of paths to input images onnx_file - is the yolov4 onnx generated with torch.onnx.export trt_logger - is -> trt.Logger(trt.Logger.INFO) calibration_bach_size is an integer I tried different numbers and got the same result

    batchstream = calibrator.ImageBatchStream(calibration_bach_size,
                                              create_calibration_dataset2(annotation_list=data_dir,
                                                                          samples=img_samples))
    int8_calibrator = calibrator.PythonEntropyCalibrator(["input"], batchstream, "cache_file.cache")
    int8_engine = build_int8_engine(onnx_file, int8_calibrator, trt_logger,
                                    max_batch_size=calibration_bach_size, worspace_size=workspace_size)

The problem the accuracy I get very bad results compared to the fp16

romil611 commented 2 years ago

thanks for sharing!

zqx1609 commented 2 years ago

Hi,I'm tring to convert model to int8 too, and I don't know how to load the data,can you share the create_calibration_dataset2 function?

Timorleiderman commented 2 years ago

create_calibration_dataset2 - this function returns a list of paths (full paths to image files - the calibration set)

for example:

  def create_calibration_dataset(data_dir, max_len=100):
      """Create list of calibration images (filename)

      Args:
          data_dir: list of directoreies or a directory path
          max_list_len: random sample from all images (-1 for all)
      """
      types = ('*.png', '*.jpg', '*.tiff', '*.bmp', '*.jpeg')
      calibration_files = list()
      if isinstance(data_dir, str):
          data_dir = [data_dir]
      for dir_d in data_dir:
          for files in types:
              calibration_files.extend(glob.glob(dir_d + files))

      shuffle(calibration_files)
      return calibration_files[:max_len]
zqx1609 commented 2 years ago

Thank you for your help! I have used your code and applied it to tensorRT 8. Then I deploy the engine on deepstream, it worked well and can distinguish car and person clearly on the h264 video.

Lenan22 commented 1 year ago

Please refer to our open source quantization tool ppq, the quantization result is better than the quantization tool that comes with tensorrt, almost the same as the float32 model. https://github.com/openppl-public/ppq/blob/master/md_doc/deploy_trt_by_OnnxParser.md

Sayyam-Jain commented 1 year ago

Please refer to our open source quantization tool ppq, the quantization result is better than the quantization tool that comes with tensorrt, almost the same as the float32 model. https://github.com/openppl-public/ppq/blob/master/md_doc/deploy_trt_by_OnnxParser.md

Hi, I have a yolov4 model, that I want to run on TensorRT INT8. I read the documentation but having a hard time following it as an English speaker. Can you please guide me on how do I convert the model and prepared the dataset for the ProgramEntrance.py script? I have dataset in Yolo format.

Thanks