Open Timorleiderman opened 2 years ago
@Timorleiderman Hi, can you please share how you converted it to int8 engine? When I tried to do so, I got roughly the same inference time as the pytorch code but as the accuracy went down.
Thisi is my calibrator.py with imagestream
import os
import cv2
import numpy as np
import tensorrt as trt
import pycuda.autoinit #
import pycuda.driver as cuda
class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator2):
def __init__(self, input_layers, stream, cache_file):
trt.IInt8EntropyCalibrator2.__init__(self)
self.input_layers = input_layers
self.stream = stream
self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes)
stream.reset()
self.cache_file = cache_file
self.current_index = 0
self.batch_size = self.stream.batch_size
def get_batch_size(self):
return self.stream.batch_size
def get_batch(self, names):
try:
batch = self.stream.next_batch()
if not batch.size:
return None
current_batch = int(self.current_index / self.stream.batch_size)
# if current_batch % 10 == 0:
print("Calibrating batch {:}, containing {:} images".format(current_batch, self.batch_size))
cuda.memcpy_htod(self.d_input, batch)
for i in self.input_layers[0]:
assert names[0] != i
# bindings = int(self.d_input)
self.current_index += self.batch_size
return [self.d_input]
except StopIteration:
# When we're out of batches, we return either [] or None.
# This signals to TensorRT that there is no calibration data remaining.
print("Stop Iteration out of batches")
return None
def read_calibration_cache(self):
# If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
if os.path.exists(self.cache_file):
with open(self.cache_file, "rb") as f:
print("using cache file ...", self.cache_file)
return f.read()
def write_calibration_cache(self, cache):
print("write cache file ...", self.cache_file)
with open(self.cache_file, 'wb') as f:
f.write(cache)
class ImageBatchStream(object):
def __init__(self, batch_size, calibration_files, channels=3, width=608, height=608):
self.batch_size = batch_size
self.max_batches = (len(calibration_files) // batch_size) + (1 if (len(calibration_files) % batch_size) else 0)
self.files = calibration_files
self.c = channels
self.w = width
self.h = height
self.calibration_data = np.zeros((batch_size, self.c, self.h, self.w), dtype=np.float32)
self.batch = 0
print("[ImageBatchStream] init ", len(self.files), " images.. with bach size ", self.batch_size)
@staticmethod
def read_image_chw(path, w, h):
print("reading .. ", path)
img_left_cv = cv2.imread(path)
img_left_cv_res = cv2.resize(img_left_cv, [w, h])
img_left_cv_res = cv2.cvtColor(img_left_cv_res, cv2.COLOR_BGR2RGB)
img_left_cv_res_tr = np.transpose(img_left_cv_res, (2, 0, 1)).astype(np.float32)
img_left_cv_res_tr = np.expand_dims(img_left_cv_res_tr, axis=0)
img_left_cv_res_tr_f32 = img_left_cv_res_tr / 255.0
return img_left_cv_res_tr_f32
def reset(self):
self.batch = 0
def next_batch(self):
if self.batch < self.max_batches:
imgs = []
files_for_batch = self.files[self.batch_size * self.batch: self.batch_size * (self.batch + 1)]
for f in files_for_batch:
print("[ImageBatchStream] Processing ", f)
img = ImageBatchStream.read_image_chw(f, self.w, self.h)
imgs.append(img)
for i in range(len(imgs)):
self.calibration_data[i] = imgs[i]
self.batch += 1
return np.ascontiguousarray(self.calibration_data, dtype=np.float32)
else:
return np.array([])
building the engine with this script
def build_int8_engine(model_onnx_file, calib, logger, max_batch_size=1, worspace_size=1):
with trt.Builder(logger) as builder,\
builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network,\
trt.OnnxParser(network, logger) as parser:
builder.max_batch_size = max_batch_size
builder.max_workspace_size = GiB(worspace_size)
builder.int8_mode = True
builder.fp16_mode = True
builder.int8_calibrator = calib
# builder.min_find_iterations = 100
# Parsing ONNX file...
with open(model_onnx_file, 'rb') as model:
model_t = parser.parse(model.read())
if not model_t:
for error in range(parser.num_errors):
print(parser.get_error(error))
# Build engine and do int8 calibration.
return builder.build_cuda_engine(network)
and this is my script for running it create_calibration_dataset2 is a function for loading a list of paths to input images onnx_file - is the yolov4 onnx generated with torch.onnx.export trt_logger - is -> trt.Logger(trt.Logger.INFO) calibration_bach_size is an integer I tried different numbers and got the same result
batchstream = calibrator.ImageBatchStream(calibration_bach_size,
create_calibration_dataset2(annotation_list=data_dir,
samples=img_samples))
int8_calibrator = calibrator.PythonEntropyCalibrator(["input"], batchstream, "cache_file.cache")
int8_engine = build_int8_engine(onnx_file, int8_calibrator, trt_logger,
max_batch_size=calibration_bach_size, worspace_size=workspace_size)
The problem the accuracy I get very bad results compared to the fp16
thanks for sharing!
Hi,I'm tring to convert model to int8 too, and I don't know how to load the data,can you share the create_calibration_dataset2 function?
create_calibration_dataset2 - this function returns a list of paths (full paths to image files - the calibration set)
for example:
def create_calibration_dataset(data_dir, max_len=100):
"""Create list of calibration images (filename)
Args:
data_dir: list of directoreies or a directory path
max_list_len: random sample from all images (-1 for all)
"""
types = ('*.png', '*.jpg', '*.tiff', '*.bmp', '*.jpeg')
calibration_files = list()
if isinstance(data_dir, str):
data_dir = [data_dir]
for dir_d in data_dir:
for files in types:
calibration_files.extend(glob.glob(dir_d + files))
shuffle(calibration_files)
return calibration_files[:max_len]
Thank you for your help! I have used your code and applied it to tensorRT 8. Then I deploy the engine on deepstream, it worked well and can distinguish car and person clearly on the h264 video.
Please refer to our open source quantization tool ppq, the quantization result is better than the quantization tool that comes with tensorrt, almost the same as the float32 model. https://github.com/openppl-public/ppq/blob/master/md_doc/deploy_trt_by_OnnxParser.md
Please refer to our open source quantization tool ppq, the quantization result is better than the quantization tool that comes with tensorrt, almost the same as the float32 model. https://github.com/openppl-public/ppq/blob/master/md_doc/deploy_trt_by_OnnxParser.md
Hi, I have a yolov4 model, that I want to run on TensorRT INT8. I read the documentation but having a hard time following it as an English speaker. Can you please guide me on how do I convert the model and prepared the dataset for the ProgramEntrance.py script? I have dataset in Yolo format.
Thanks
I tried to convert the YoloV4 model to int8 with TensorRT 7.2.3.4 and I'm getting really bad results. The int8 engine is really fast indeed but the accuracy is really bad.
Anybody succeed to convert the model to int8 and got good results?