Open congphase opened 5 years ago
you say that you're passing your bounding boxes from another detector right? is that detector using the gpu? can you share the lines of how you're passing your bounding boxes?
@Reddyforcode Thank you for your attention,
I used the detector called LFFD with the sample code at link, and it used GPU. I modified it to do inference on my video. It runs the detection properly, but when the first detection is used for getting encoding, that error occurs.
Here's how I passed each bbox:
bbox = bboxes[0]
bb_conf = bbox[4]
bb_left = bbox[0]
bb_top = bbox[1]
bb_right = bbox[2]
bb_bottom = bbox[3] # those are all floats
bb_left = int(bb_left)
bb_top = int(bb_top)
bb_right = int(bb_right)
bb_bottom = int(bb_bottom) # converted to int to be used in face_recognition
css_type_face_location = [(bb_top, bb_right, bb_bottom, bb_left)]
face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0] # at this line that error occurs
I first suspected that it is because I do inference on a pre-set smaller region of the original frame size (1280, 720). This is how I did it:
# the pre-set ROI region
SPREAD_ROI = (358, 277, 784, 568)
# the image cropped with the region
frame_spread_roi = frame[SPREAD_ROI[1]:SPREAD_ROI[3], SPREAD_ROI[0]:SPREAD_ROI[2]]
But when tried reset that region to be equal to the original one to apply inference, it's still that error. But there's a special point, the error output is all of what I mentioned, but misses only these lines:
cudaFree() failed. Reason: invalid device pointer cudaFreeHost() failed. Reason: invalid argument cudaStreamDestroy() failed. Reason: unknown error cudaFree() failed. Reason: invalid device pointer cudaFreeHost() failed. Reason: invalid argument cudaFree() failed. Reason: invalid device pointer
(The line "Segmentation fault (core dumped)" is still there). I don't know why.
It says Core Dumped it means that you are out of memory in your gpu try executing in the command line
nvidia-smi
it will give you the usage of your gpu memory.
I guess you joined the face_recognition and your external code for face detection the wrong way. possible error without seeing your complete project:
@Reddyforcode I have monitored the memory with jtop when the program started to run till it ended. And memory never reached to the maximum, it only consumed around 2.5GB/4GB. The GPU reached to the maximum 99% and turned back to 0% periodically.
By the way, I used Jetson Nano, which is advised to have a line of dlib's source code changed before compiling to avoid a bug, mentioned by Adam Geitgey in link (search for 'line 854' for quick reference). That file that was changed is the file that the python interpreter prompted this weird error. I'm confused, because if the problem is from my dlib api files, I wouldn't have had done face_recognition_face_encodings successfully (I've done it before), it must be caused by my passing bbox to the face_encodings(), but I've checked several times, it's still not the point.
I have tried to re-check my code up to your 2 suggestions but it nothing seems to work :( Here's my code:
#################################################################
######################### RELEASE NOTES #########################
#################################################################
'''
- Process order:
RFID scanned --> This script run --> Verify the scanned RFID number against current face
This order, compared to face-first_RFID-later or face-only, provides a more balanced approach,
which is a compromise between demanding actual use and the complex generalization of the algorithm
- How to run:
python DO_FACE_VERIFICATION.py --video_in <> --video_out <> --candidate <>
- SPREAD_ROI is used, detection speed is reduced to around 90ms/frame, but sacrificing competency
-
'''
import argparse
import sys
#import dlib
import time
import cv2
import os
import numpy
import face_recognition
import api_dirs
import freq_cv
import rtsp_cam
import tnt_info
# from time import gmtime, strftime
from datetime import datetime
from datetime import date
import logging
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
####################################
## WHERE THE DEFINITIONS ARE LAID ##
####################################
# ROI FOR REDUCING COMPUTATION
SPREAD_ROI = rtsp_cam.SPREAD_ROI
# Messages
MSG_NHIN_TRUC_TIEP_VAO_CAM = "MSG_HAY_NHIN_TRUC_TIEP_VAO_CAMERA"
MSG_XAC_THUC_KHUON_MAT_THANH_CONG = "MSG_XAC_THUC_KHUON_MAT_THANH_CONG"
MSG_XAC_THUC_KHUON_MAT_KHONG_THANH_CONG = "MSG_XAC_THUC_KHUON_MAT_KHONG_THANH_CONG"
MSG_KHONG_TIM_THAY_KHUON_MAT_NAO = "MSG_KHONG_TIM_THAY_KHUON_MAT_NAO"
MSG_HAY_QUET_LAI_THE_VA_NHIN_THANG_VAO_CAMERA = "MSG_HAY_QUET_LAI_THE_VA_NHIN_THANG_VAO_CAMERA"
# VARIABLES
# The cnn version of flib face detection
#cnn_detector = dlib.cnn_face_detection_model_v1(api_dirs.face_detection_model_cnn)
# Window names
CAP_WINDOW_NAME = 'CameraDemo'
VERIF_WINDOW_NAME = 'Verification Window'
def parse_args():
# Parse input arguments
desc = 'Do face verification on video'
parser = argparse.ArgumentParser(description=desc)
parser.add_argument("--video_in", dest='video_in', required=True,
help='Video input')
parser.add_argument("--video_out", dest='video_out', required=True,
help='Video output')
parser.add_argument("--candidate_id", dest='candidate_id', required=True,
type=int,
help="Candidate index: [1, 2, 3, ...]")
args = parser.parse_args()
return args
import cProfile, pstats, io
def profile(fnc):
"""A decorator that uses cProfile to profile a function"""
def inner(*args, **kwargs):
pr = cProfile.Profile()
pr.enable()
retval = fnc(*args, **kwargs)
pr.disable()
s = io.StringIO()
sortby = 'cumulative'
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())
return retval
return inner
logging.getLogger().setLevel(logging.DEBUG)
def NMS(boxes, overlap_threshold):
'''
:param boxes: numpy nx5, n is the number of boxes, 0:4->x1, y1, x2, y2, 4->score
:param overlap_threshold:
:return:
'''
if boxes.shape[0] == 0:
return boxes
# if the bounding boxes integers, convert them to floats --
# this is important since we'll be doing a bunch of divisions
if boxes.dtype != numpy.float32:
boxes = boxes.astype(numpy.float32)
# initialize the list of picked indexes
pick = []
# grab the coordinates of the bounding boxes
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
sc = boxes[:, 4]
widths = x2 - x1
heights = y2 - y1
# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = heights * widths
idxs = numpy.argsort(sc)
# keep looping while some indexes still remain in the indexes list
while len(idxs) > 0:
# grab the last index in the indexes list and add the
# index value to the list of picked indexes
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
# compare secend highest score boxes
xx1 = numpy.maximum(x1[i], x1[idxs[:last]])
yy1 = numpy.maximum(y1[i], y1[idxs[:last]])
xx2 = numpy.minimum(x2[i], x2[idxs[:last]])
yy2 = numpy.minimum(y2[i], y2[idxs[:last]])
# compute the width and height of the box
w = numpy.maximum(0, xx2 - xx1 + 1)
h = numpy.maximum(0, yy2 - yy1 + 1)
# compute the ratio of overlap
overlap = (w * h) / area[idxs[:last]]
# delete all indexes from the index list that have
idxs = numpy.delete(idxs, numpy.concatenate(([last], numpy.where(overlap > overlap_threshold)[0])))
# return only the bounding boxes that were picked using the
# integer data type
return boxes[pick]
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
class Inference_TensorRT:
def __init__(self, onnx_file_path,
receptive_field_list,
receptive_field_stride,
bbox_small_list,
bbox_large_list,
receptive_field_center_start,
num_output_scales):
temp_trt_file = os.path.join('trt_file_cache/', os.path.basename(onnx_file_path).replace('.onnx', '.trt'))
load_trt_flag = False
if not os.path.exists(temp_trt_file):
if not os.path.exists(onnx_file_path):
logging.error('ONNX file does not exist!')
sys.exit(1)
logging.info('Init engine from ONNX file.')
else:
load_trt_flag = True
logging.info('Init engine from serialized engine.')
self.receptive_field_list = receptive_field_list
self.receptive_field_stride = receptive_field_stride
self.bbox_small_list = bbox_small_list
self.bbox_large_list = bbox_large_list
self.receptive_field_center_start = receptive_field_center_start
self.num_output_scales = num_output_scales
self.constant = [i / 2.0 for i in self.receptive_field_list]
# init log
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
self.engine = None
if load_trt_flag:
with open(temp_trt_file, 'rb') as fin, trt.Runtime(TRT_LOGGER) as runtime:
self.engine = runtime.deserialize_cuda_engine(fin.read())
else:
# declare builder object
logging.info('Create TensorRT builder.')
builder = trt.Builder(TRT_LOGGER)
# get network object via builder
logging.info('Create TensorRT network.')
network = builder.create_network()
# create ONNX parser object
logging.info('Create TensorRT ONNX parser.')
parser = trt.OnnxParser(network, TRT_LOGGER)
with open(onnx_file_path, 'rb') as onnx_fin:
parser.parse(onnx_fin.read())
# print possible errors
num_error = parser.num_errors
if num_error != 0:
logging.error('Errors occur while parsing the ONNX file!')
for i in range(num_error):
temp_error = parser.get_error(i)
print(temp_error.desc())
sys.exit(1)
# create engine via builder
builder.max_batch_size = 1
builder.average_find_iterations = 2
logging.info('Create TensorRT engine...')
engine = builder.build_cuda_engine(network)
# serialize engine
if not os.path.exists('trt_file_cache/'):
os.makedirs('trt_file_cache/')
logging.info('Serialize the engine for fast init.')
with open(os.path.join('trt_file_cache/', os.path.basename(onnx_file_path).replace('.onnx', '.trt')), 'wb') as fout:
fout.write(engine.serialize())
self.engine = engine
self.output_shapes = []
self.input_shapes = []
for binding in self.engine:
if self.engine.binding_is_input(binding):
self.input_shapes.append(tuple([self.engine.max_batch_size] + list(self.engine.get_binding_shape(binding))))
else:
self.output_shapes.append(tuple([self.engine.max_batch_size] + list(self.engine.get_binding_shape(binding))))
if len(self.input_shapes) != 1:
logging.error('Only one input data is supported.')
sys.exit(1)
self.input_shape = self.input_shapes[0]
logging.info('The required input size: %d, %d, %d' % (self.input_shape[2], self.input_shape[3], self.input_shape[1]))
# create executor
self.executor = self.engine.create_execution_context()
self.inputs, self.outputs, self.bindings = self.__allocate_buffers(self.engine)
def __allocate_buffers(self, engine):
inputs = []
outputs = []
bindings = []
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings
def do_inference(self, image, score_threshold=0.4, top_k=10000, NMS_threshold=0.4, NMS_flag=True, skip_scale_branch_list=[]):
if image.ndim != 3 or image.shape[2] != 3:
print('Only RGB images are supported.')
return None
input_height = self.input_shape[2]
input_width = self.input_shape[3]
if image.shape[0] != input_height or image.shape[1] != input_width:
logging.info('The size of input image is not %dx%d.\nThe input image will be resized keeping the aspect ratio.' % (input_height, input_width))
input_batch = numpy.zeros((1, input_height, input_width, self.input_shape[1]), dtype=numpy.float32)
left_pad = 0
top_pad = 0
if image.shape[0] / image.shape[1] > input_height / input_width:
resize_scale = input_height / image.shape[0]
input_image = cv2.resize(image, (0, 0), fx=resize_scale, fy=resize_scale)
left_pad = int((input_width - input_image.shape[1]) / 2)
input_batch[0, :, left_pad:left_pad + input_image.shape[1], :] = input_image
else:
resize_scale = input_width / image.shape[1]
input_image = cv2.resize(image, (0, 0), fx=resize_scale, fy=resize_scale)
top_pad = int((input_height - input_image.shape[0]) / 2)
input_batch[0, top_pad:top_pad + input_image.shape[0], :, :] = input_image
input_batch = input_batch.transpose([0, 3, 1, 2])
input_batch = numpy.array(input_batch, dtype=numpy.float32, order='C')
self.inputs[0].host = input_batch
[cuda.memcpy_htod(inp.device, inp.host) for inp in self.inputs]
self.executor.execute(batch_size=self.engine.max_batch_size, bindings=self.bindings)
[cuda.memcpy_dtoh(output.host, output.device) for output in self.outputs]
outputs = [out.host for out in self.outputs]
outputs = [numpy.squeeze(output.reshape(shape)) for output, shape in zip(outputs, self.output_shapes)]
bbox_collection = []
for i in range(self.num_output_scales):
if i in skip_scale_branch_list:
continue
score_map = numpy.squeeze(outputs[i * 2])
# show feature maps-------------------------------
# score_map_show = score_map * 255
# score_map_show[score_map_show < 0] = 0
# score_map_show[score_map_show > 255] = 255
# cv2.imshow('score_map' + str(i), cv2.resize(score_map_show.astype(dtype=numpy.uint8), (0, 0), fx=2, fy=2))
# cv2.waitKey()
bbox_map = numpy.squeeze(outputs[i * 2 + 1])
RF_center_Xs = numpy.array([self.receptive_field_center_start[i] + self.receptive_field_stride[i] * x for x in range(score_map.shape[1])])
RF_center_Xs_mat = numpy.tile(RF_center_Xs, [score_map.shape[0], 1])
RF_center_Ys = numpy.array([self.receptive_field_center_start[i] + self.receptive_field_stride[i] * y for y in range(score_map.shape[0])])
RF_center_Ys_mat = numpy.tile(RF_center_Ys, [score_map.shape[1], 1]).T
x_lt_mat = RF_center_Xs_mat - bbox_map[0, :, :] * self.constant[i]
y_lt_mat = RF_center_Ys_mat - bbox_map[1, :, :] * self.constant[i]
x_rb_mat = RF_center_Xs_mat - bbox_map[2, :, :] * self.constant[i]
y_rb_mat = RF_center_Ys_mat - bbox_map[3, :, :] * self.constant[i]
x_lt_mat = x_lt_mat
x_lt_mat[x_lt_mat < 0] = 0
y_lt_mat = y_lt_mat
y_lt_mat[y_lt_mat < 0] = 0
x_rb_mat = x_rb_mat
x_rb_mat[x_rb_mat > input_width] = input_width
y_rb_mat = y_rb_mat
y_rb_mat[y_rb_mat > input_height] = input_height
select_index = numpy.where(score_map > score_threshold)
for idx in range(select_index[0].size):
bbox_collection.append((x_lt_mat[select_index[0][idx], select_index[1][idx]] - left_pad,
y_lt_mat[select_index[0][idx], select_index[1][idx]] - top_pad,
x_rb_mat[select_index[0][idx], select_index[1][idx]] - left_pad,
y_rb_mat[select_index[0][idx], select_index[1][idx]] - top_pad,
score_map[select_index[0][idx], select_index[1][idx]]))
# NMS
bbox_collection = sorted(bbox_collection, key=lambda item: item[-1], reverse=True)
if len(bbox_collection) > top_k:
bbox_collection = bbox_collection[0:top_k]
bbox_collection_numpy = numpy.array(bbox_collection, dtype=numpy.float32)
bbox_collection_numpy = bbox_collection_numpy / resize_scale
if NMS_flag:
final_bboxes = NMS(bbox_collection_numpy, NMS_threshold)
final_bboxes_ = []
for i in range(final_bboxes.shape[0]):
final_bboxes_.append((final_bboxes[i, 0], final_bboxes[i, 1], final_bboxes[i, 2], final_bboxes[i, 3], final_bboxes[i, 4]))
return final_bboxes_
else:
return bbox_collection_numpy
def draw_border(img, pt1, pt2, color, thickness, r, d):
x1, y1 = pt1
x2, y2 = pt2
line = cv2.line
ellipse = cv2.ellipse
# Top left
line(img, (x1 + r, y1), (x1 + r + d, y1), color, thickness)
line(img, (x1, y1 + r), (x1, y1 + r + d), color, thickness)
ellipse(img, (x1 + r, y1 + r), (r, r), 180, 0, 90, color, thickness)
# Top right
line(img, (x2 - r, y1), (x2 - r - d, y1), color, thickness)
line(img, (x2, y1 + r), (x2, y1 + r + d), color, thickness)
ellipse(img, (x2 - r, y1 + r), (r, r), 270, 0, 90, color, thickness)
# Bottom left
line(img, (x1 + r, y2), (x1 + r + d, y2), color, thickness)
line(img, (x1, y2 - r), (x1, y2 - r - d), color, thickness)
ellipse(img, (x1 + r, y2 - r), (r, r), 90, 0, 90, color, thickness)
# Bottom right
line(img, (x2 - r, y2), (x2 - r - d, y2), color, thickness)
line(img, (x2, y2 - r), (x2, y2 - r - d), color, thickness)
ellipse(img, (x2 - r, y2 - r), (r, r), 0, 0, 90, color, thickness)
def run_inference(video_in, video_out, candidate_id, current_time):
"""
:param video_in: input video needed to be processed, or, rtsp video stream feed
:param video_out: output video of the process, used for re-checking
:param candidate_id: id passed by RF
:param current_time: 'd' or 'n', which is day or night -- the time this script is run
:return: not yet known
"""
# Initialize some frequently called methods to reduce time
get_face_encodings = face_recognition.face_encodings
compare_faces = face_recognition.compare_faces
imshow = cv2.imshow
rectangle = cv2.rectangle
cvtColor = cv2.cvtColor
COLOR_BGR2RGB = cv2.COLOR_BGR2RGB
get_time = time.time
GREEN = freq_cv.GREEN
transform_coordinates = freq_cv.transform_coordinates
get_smpl_encs = tnt_info.get_smpl_encs
log_info = logging.info
log_debug = logging.debug
log_warning = logging.warning
log_error = logging.error
'''
********************************************************************************************************************
VIDEO FILE/VIDEO STREAM HANDLING
********************************************************************************************************************
'''
# Initialize video stuff
fourcc = cv2.VideoWriter_fourcc(*'XVID')
input_movie = cv2.VideoCapture(video_in)
output_movie = cv2.VideoWriter(video_out, fourcc, 24, (rtsp_cam.WIDTH, rtsp_cam.HEIGHT))
movie_read = input_movie.read
movie_isOpened = input_movie.isOpened
movie_write = output_movie.write
'''
********************************************************************************************************************
DETECTION MODEL STUFF INITIALIZATION
********************************************************************************************************************
'''
import sys
sys.path.append('..')
from config_farm import configuration_10_320_20L_5scales_v2 as cfg
#from config_farm import configuration_10_560_25L_8scales_v1 as cfg
onnx_file_path = './onnx_files/v2.onnx'
myInference = Inference_TensorRT(
onnx_file_path=onnx_file_path,
receptive_field_list=cfg.param_receptive_field_list,
receptive_field_stride=cfg.param_receptive_field_stride,
bbox_small_list=cfg.param_bbox_small_list,
bbox_large_list=cfg.param_bbox_large_list,
receptive_field_center_start=cfg.param_receptive_field_center_start,
num_output_scales=cfg.param_num_output_scales)
do_inference = myInference.do_inference
process_this_frame = True
log_debug(f'spread roi used: {SPREAD_ROI}')
# Handling the video
while movie_isOpened():
ret, frame = movie_read()
# Bail out when the video file ends
if not ret:
log_info('video file ends')
break
#frame = cvtColor(frame, COLOR_BGR2RGB)
frame_spread_roi = frame
#frame_spread_roi = frame[SPREAD_ROI[1]:SPREAD_ROI[3], SPREAD_ROI[0]:SPREAD_ROI[2]]
# Only process every other frame of video to save time
if process_this_frame:
start = get_time()
bboxes = do_inference(frame_spread_roi, score_threshold=0.6, top_k=1000, NMS_threshold=0.2, NMS_flag=True)
end = get_time()
log_debug(f'detection takes: {(end-start):.3f}s')
log_debug(f'len(bboxes) = {len(bboxes)}')
if len(bboxes) == 0:
log_info('detected: 0 face')
imshow('Video', frame)
movie_write(frame)
continue
elif len(bboxes) == 1:
log_info('detected: 1 face')
# get the (only) face location of this frame
bbox = bboxes[0]
bb_conf = bbox[4]
bb_left = int(bbox[0])
bb_top = int(bbox[1])
bb_right = int(bbox[2])
bb_bottom = int(bbox[3])
log_debug(f'(conf | left, top, right, bottom) = ({bb_conf:.2f}, '
f'{bb_left}, {bb_top}, {bb_right}, {bb_bottom})')
spread_roi_bb_coords = (bb_left, bb_top, bb_right, bb_bottom)
# Convert the coordinates and update the bb values
bb_left, bb_top, bb_right, bb_bottom = transform_coordinates(SPREAD_ROI,
spread_roi_bb_coords)
# if the scale used is spread-roi-scaled, anchor confidence is lower
anchor_conf = 0.6
if max(SPREAD_ROI) == 1280:
anchor_conf = 1.6
# if detected bb has confidence lower than anchor confidence, don't do recognition
if bb_conf < anchor_conf:
draw_border(frame, (bb_left, bb_top), (bb_right, bb_bottom), GREEN, 2, 5, 10)
imshow('Video', frame)
movie_write(frame)
log_info('confidence is low. Skipping ... ')
continue
# confidence meets the requirement for doing recognition
# css_type is needed for face_recognition.face_encodings
css_type_face_location = [(bb_top, bb_right, bb_bottom, bb_left)]
log_debug(f'css_type_face_location: {css_type_face_location}')
# conversion to RGB is needed for face_recognition
frame = cvtColor(frame, COLOR_BGR2RGB)
'''
import dlib
shape_predictor = dlib.shape_predictor(api_dirs.shape_predictor_small)
face_rec = dlib.face_recognition_model_v1(api_dirs.face_recognition)
dlib_rectangle_face_location = dlib.rectangle(left=bb_left, top=bb_top,
right=bb_right, bottom=bb_bottom)
shape = shape_predictor(frame, dlib_rectangle_face_location)
face_chip = dlib.get_face_chip(frame, shape)
start = get_time()
face_encoding = face_rec.compute_face_descriptor(face_chip)
end = get_time()
log_info(f'calculating encoding takes: {(end - start):.3f}s')
log_debug(f'face_encoding: {face_encoding}')
log_debug(f'face_encoding type: {type(face_encoding)}')
'''
# get encoding for the face detected
stat = get_time()
face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
end = get_time()
log_debug(f'calculating encoding takes: {(end-start):.4f}s')
# See if the face is a match for the known face(s) according to candidate id
candidate_known_face_encodings = get_smpl_encs(candidate_id, current_time)
start = get_time()
matches = compare_faces(candidate_known_face_encodings, face_encoding, 0.5)
end = get_time()
log_debug(f'comparing encodings takes: {(end-start):.4f}s')
log_debug(f'matches: {matches}')
name = "Unknown"
# If num of matches is over 50%, then it's it
log_debug(f'True/total: {matches.count(True)}/{len(matches)}')
if matches.count(True) > (len(matches) / 2):
name = tnt_info.tnt_name_tup[candidate_id].split()[-1]
# Or instead, use the known face with the smallest distance to the new face
# face_distances = face_recognition.face_distance(candidate_known_face_encodings, face_encoding)
# best_match_index = np.argmin(face_distances)
# if matches[best_match_index]:
# name = known_face_names[best_match_index]
# Draw a box around the face
draw_border(frame, (bb_left, bb_top), (bb_right, bb_bottom), GREEN, 2, 5, 10)
freq_cv.stick_name(frame, (bb_left, bb_top, bb_right, bb_bottom), name)
# Display the resulting image
imshow('Video', frame)
movie_write(frame)
# Hit 'q' on the keyboard to quit!
if cv2.waitKey(20) & 0xFF == ord('q'):
break
elif len(bboxes) > 1:
log_info('detected: > 1 face')
# get the left-most bbox
log_debug(f'bboxes looks like: {bboxes}')
log_debug('this is to know at what index x1 stays. Take a look at it')
for i, bbox in enumerate(bboxes):
log_debug(f'bboxes[{i}] = {bbox}')
log_debug(f'spread_roi_coords: ({bbox[0]}, {bbox[1]}, {bbox[2]}, {bbox[3]})')
spread_roi_bb_coords = (bbox[0], bbox[1], bbox[2], bbox[3])
#bb_left, bb_top, bb_right, bb_bottom = transform_coordinates(SPREAD_ROI, spread_roi_bb_coords)
rectangle(frame, (bb_left, bb_top),
(bb_right, bb_bottom), GREEN, 2)
log_debug(f'min of x1 values = {min((bboxes[0:])[0])}')
else:
log_error('num of faces detected < 0')
exit()
log_debug(f'max(frame.shape[:2]) = {max(frame.shape[:2])}')
if max(frame.shape[:2]) > 1440:
scale = 1440 / max(frame.shape[:2])
frame = cv2.resize(frame, (0, 0), fx=scale, fy=scale)
cv2.imshow('Video', frame)
cv2.waitKey(10)
process_this_frame = not process_this_frame
"""
def verify_face(to_be_verified, smpl, candidate_id):
# check to determine which one is larger to be the anchor img size
"""
@profile
def main():
args = parse_args()
print('Called with args:')
print(args)
# assign the variables
candidate_id = args.candidate_id
# get current time when this script is called
now = datetime.now()
today_night_time = now.replace(hour=18, minute=0, second=0, microsecond=0)
print("[log ] now: {}".format(now))
if now < today_night_time:
current_time = 'd'
else:
current_time = 'n'
# freq_cv.open_window(CAP_WINDOW_NAME, rtsp_cam.WIDTH, rtsp_cam.HEIGHT, "Captured")
run_inference(args.video_in, args.video_out, candidate_id, current_time)
cv2.destroyAllWindows()
if __name__ == '__main__':
main()
Can you take a look at it or come up with any more ideas?
@Reddyforcode
I've recently narrowed down the scale of this issue. The problem is the line:
import pycuda.autoinit
What I did: I wrote a script that takes an image containing an obvious face, it has the workflow as: Performs detection, then gets its face encoding and prints that out. The script contains 2 parts: One performs the stated workflow with dlib used for detection and face_recognition (which is also dlib to some extent) used for getting the face encoding; The other part performs the same workflow with the only difference is that LFFD is used for detection. On the first run, I commented the second part, and that ugly error still showed up when it is executing face_encodings function. Next, I uncommented the second part and commented out the first, that error still showed up at that same function. I then tried commenting each of the import statements, and it turned out that import pycuda.autoinit
is the case, when I commented it and part 1 ran beautifully with no errors. Having this line caused dlib's cudnnConvolutionForward() function to fail. My code (let's just look for the eye-catching "FIRST PART" and "WHERE SECOND PART ACTUALLY RUNS" only):
import dlib
import cv2
import os
import logging
import numpy
import api_dirs
import freq_cv
import pycuda.driver as cuda
import pycuda.autoinit
import tensorrt as trt
import face_recognition
import my_api
test_img = cv2.imread("/home/gate/lffd-dir/A-Light-and-Fast-Face-Detector-for-Edge-Devices/face_detection/deploy_tensorrt/high_conf_446.jpg")
get_face_encodings = face_recognition.face_encodings
######################################################################
# FIRST PART #
######################################################################
# dlib
cnn_detector = dlib.cnn_face_detection_model_v1("/home/gate/Downloads/mmod_human_face_detector.dat")
test_img_dlib = cv2.cvtColor(test_img, cv2.COLOR_BGR2RGB)
# just for debugging
cv2.imwrite('test_img_dlib.jpg', test_img_dlib)
face_locations = cnn_detector(test_img_dlib, 0)
print(f'[dlib] face_locations = {face_locations}')
face_location = face_locations[0]
bb_conf = face_location.confidence
bb_left = face_location.rect.left()
bb_top = face_location.rect.top()
bb_right = face_location.rect.right()
bb_bottom = face_location.rect.bottom()
print(f'[dlib] (l, t, r, b, c) = ({bb_left}, {bb_top}, {bb_right}, {bb_bottom}, {bb_conf})')
test_img_dlib = cv2.rectangle(test_img_dlib, (bb_left, bb_top), (bb_right, bb_bottom), (0, 255, 0), 2)
cv2.imwrite(f'[dlib] detected_dlib.jpg', test_img_dlib)
print(f'[dlib] detected_dlib.jpg written!')
print(f'[dlib] doing face recognition ... ')
css_type_face_location = [(bb_top, bb_right, bb_bottom, bb_left)]
print(f'[dlib] css = {css_type_face_location}')
face_encoding = get_face_encodings(test_img_dlib, css_type_face_location, 0)[0] # error showed up at this line
print(f'[dlib] face_encoding:\n{face_encoding}')
######################################################################
# SECOND PART #
######################################################################
# lffd
logging.getLogger().setLevel(logging.DEBUG)
def NMS(boxes, overlap_threshold):
'''
:param boxes: numpy nx5, n is the number of boxes, 0:4->x1, y1, x2, y2, 4->score
:param overlap_threshold:
:return:
'''
if boxes.shape[0] == 0:
return boxes
# if the bounding boxes integers, convert them to floats --
# this is important since we'll be doing a bunch of divisions
if boxes.dtype != numpy.float32:
boxes = boxes.astype(numpy.float32)
# initialize the list of picked indexes
pick = []
# grab the coordinates of the bounding boxes
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
sc = boxes[:, 4]
widths = x2 - x1
heights = y2 - y1
# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = heights * widths
idxs = numpy.argsort(sc)
# keep looping while some indexes still remain in the indexes list
while len(idxs) > 0:
# grab the last index in the indexes list and add the
# index value to the list of picked indexes
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
# compare secend highest score boxes
xx1 = numpy.maximum(x1[i], x1[idxs[:last]])
yy1 = numpy.maximum(y1[i], y1[idxs[:last]])
xx2 = numpy.minimum(x2[i], x2[idxs[:last]])
yy2 = numpy.minimum(y2[i], y2[idxs[:last]])
# compute the width and height of the box
w = numpy.maximum(0, xx2 - xx1 + 1)
h = numpy.maximum(0, yy2 - yy1 + 1)
# compute the ratio of overlap
overlap = (w * h) / area[idxs[:last]]
# delete all indexes from the index list that have
idxs = numpy.delete(idxs, numpy.concatenate(([last], numpy.where(overlap > overlap_threshold)[0])))
# return only the bounding boxes that were picked using the
# integer data type
return boxes[pick]
# Simple helper data class that's a little nicer to use than a 2-tuple.
class HostDeviceMem(object):
def __init__(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem
def __str__(self):
return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
def __repr__(self):
return self.__str__()
class Inference_TensorRT:
def __init__(self, onnx_file_path,
receptive_field_list,
receptive_field_stride,
bbox_small_list,
bbox_large_list,
receptive_field_center_start,
num_output_scales):
temp_trt_file = os.path.join('trt_file_cache/', os.path.basename(onnx_file_path).replace('.onnx', '.trt'))
load_trt_flag = False
if not os.path.exists(temp_trt_file):
if not os.path.exists(onnx_file_path):
logging.error('ONNX file does not exist!')
sys.exit(1)
logging.info('Init engine from ONNX file.')
else:
load_trt_flag = True
logging.info('Init engine from serialized engine.')
self.receptive_field_list = receptive_field_list
self.receptive_field_stride = receptive_field_stride
self.bbox_small_list = bbox_small_list
self.bbox_large_list = bbox_large_list
self.receptive_field_center_start = receptive_field_center_start
self.num_output_scales = num_output_scales
self.constant = [i / 2.0 for i in self.receptive_field_list]
# init log
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
self.engine = None
if load_trt_flag:
with open(temp_trt_file, 'rb') as fin, trt.Runtime(TRT_LOGGER) as runtime:
self.engine = runtime.deserialize_cuda_engine(fin.read())
else:
# declare builder object
logging.info('Create TensorRT builder.')
builder = trt.Builder(TRT_LOGGER)
# get network object via builder
logging.info('Create TensorRT network.')
network = builder.create_network()
# create ONNX parser object
logging.info('Create TensorRT ONNX parser.')
parser = trt.OnnxParser(network, TRT_LOGGER)
with open(onnx_file_path, 'rb') as onnx_fin:
parser.parse(onnx_fin.read())
# print possible errors
num_error = parser.num_errors
if num_error != 0:
logging.error('Errors occur while parsing the ONNX file!')
for i in range(num_error):
temp_error = parser.get_error(i)
print(temp_error.desc())
sys.exit(1)
# create engine via builder
builder.max_batch_size = 1
builder.average_find_iterations = 2
logging.info('Create TensorRT engine...')
engine = builder.build_cuda_engine(network)
# serialize engine
if not os.path.exists('trt_file_cache/'):
os.makedirs('trt_file_cache/')
logging.info('Serialize the engine for fast init.')
with open(os.path.join('trt_file_cache/', os.path.basename(onnx_file_path).replace('.onnx', '.trt')), 'wb') as fout:
fout.write(engine.serialize())
self.engine = engine
self.output_shapes = []
self.input_shapes = []
for binding in self.engine:
if self.engine.binding_is_input(binding):
self.input_shapes.append(tuple([self.engine.max_batch_size] + list(self.engine.get_binding_shape(binding))))
else:
self.output_shapes.append(tuple([self.engine.max_batch_size] + list(self.engine.get_binding_shape(binding))))
if len(self.input_shapes) != 1:
logging.error('Only one input data is supported.')
sys.exit(1)
self.input_shape = self.input_shapes[0]
logging.info('The required input size: %d, %d, %d' % (self.input_shape[2], self.input_shape[3], self.input_shape[1]))
# create executor
self.executor = self.engine.create_execution_context()
self.inputs, self.outputs, self.bindings = self.__allocate_buffers(self.engine)
def __allocate_buffers(self, engine):
inputs = []
outputs = []
bindings = []
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
# Allocate host and device buffers
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
# Append the device buffer to device bindings.
bindings.append(int(device_mem))
# Append to the appropriate list.
if engine.binding_is_input(binding):
inputs.append(HostDeviceMem(host_mem, device_mem))
else:
outputs.append(HostDeviceMem(host_mem, device_mem))
return inputs, outputs, bindings
def do_inference(self, image, score_threshold=0.4, top_k=10000, NMS_threshold=0.4, NMS_flag=True, skip_scale_branch_list=[]):
if image.ndim != 3 or image.shape[2] != 3:
print('Only RGB images are supported.')
return None
input_height = self.input_shape[2]
input_width = self.input_shape[3]
if image.shape[0] != input_height or image.shape[1] != input_width:
logging.info('The size of input image is not %dx%d.\nThe input image will be resized keeping the aspect ratio.' % (input_height, input_width))
input_batch = numpy.zeros((1, input_height, input_width, self.input_shape[1]), dtype=numpy.float32)
left_pad = 0
top_pad = 0
if image.shape[0] / image.shape[1] > input_height / input_width:
resize_scale = input_height / image.shape[0]
input_image = cv2.resize(image, (0, 0), fx=resize_scale, fy=resize_scale)
left_pad = int((input_width - input_image.shape[1]) / 2)
input_batch[0, :, left_pad:left_pad + input_image.shape[1], :] = input_image
else:
resize_scale = input_width / image.shape[1]
input_image = cv2.resize(image, (0, 0), fx=resize_scale, fy=resize_scale)
top_pad = int((input_height - input_image.shape[0]) / 2)
input_batch[0, top_pad:top_pad + input_image.shape[0], :, :] = input_image
input_batch = input_batch.transpose([0, 3, 1, 2])
input_batch = numpy.array(input_batch, dtype=numpy.float32, order='C')
self.inputs[0].host = input_batch
[cuda.memcpy_htod(inp.device, inp.host) for inp in self.inputs]
self.executor.execute(batch_size=self.engine.max_batch_size, bindings=self.bindings)
[cuda.memcpy_dtoh(output.host, output.device) for output in self.outputs]
outputs = [out.host for out in self.outputs]
outputs = [numpy.squeeze(output.reshape(shape)) for output, shape in zip(outputs, self.output_shapes)]
bbox_collection = []
for i in range(self.num_output_scales):
if i in skip_scale_branch_list:
continue
score_map = numpy.squeeze(outputs[i * 2])
# show feature maps-------------------------------
# score_map_show = score_map * 255
# score_map_show[score_map_show < 0] = 0
# score_map_show[score_map_show > 255] = 255
# cv2.imshow('score_map' + str(i), cv2.resize(score_map_show.astype(dtype=numpy.uint8), (0, 0), fx=2, fy=2))
# cv2.waitKey()
bbox_map = numpy.squeeze(outputs[i * 2 + 1])
RF_center_Xs = numpy.array([self.receptive_field_center_start[i] + self.receptive_field_stride[i] * x for x in range(score_map.shape[1])])
RF_center_Xs_mat = numpy.tile(RF_center_Xs, [score_map.shape[0], 1])
RF_center_Ys = numpy.array([self.receptive_field_center_start[i] + self.receptive_field_stride[i] * y for y in range(score_map.shape[0])])
RF_center_Ys_mat = numpy.tile(RF_center_Ys, [score_map.shape[1], 1]).T
x_lt_mat = RF_center_Xs_mat - bbox_map[0, :, :] * self.constant[i]
y_lt_mat = RF_center_Ys_mat - bbox_map[1, :, :] * self.constant[i]
x_rb_mat = RF_center_Xs_mat - bbox_map[2, :, :] * self.constant[i]
y_rb_mat = RF_center_Ys_mat - bbox_map[3, :, :] * self.constant[i]
x_lt_mat = x_lt_mat
x_lt_mat[x_lt_mat < 0] = 0
y_lt_mat = y_lt_mat
y_lt_mat[y_lt_mat < 0] = 0
x_rb_mat = x_rb_mat
x_rb_mat[x_rb_mat > input_width] = input_width
y_rb_mat = y_rb_mat
y_rb_mat[y_rb_mat > input_height] = input_height
select_index = numpy.where(score_map > score_threshold)
for idx in range(select_index[0].size):
bbox_collection.append((x_lt_mat[select_index[0][idx], select_index[1][idx]] - left_pad,
y_lt_mat[select_index[0][idx], select_index[1][idx]] - top_pad,
x_rb_mat[select_index[0][idx], select_index[1][idx]] - left_pad,
y_rb_mat[select_index[0][idx], select_index[1][idx]] - top_pad,
score_map[select_index[0][idx], select_index[1][idx]]))
# NMS
bbox_collection = sorted(bbox_collection, key=lambda item: item[-1], reverse=True)
if len(bbox_collection) > top_k:
bbox_collection = bbox_collection[0:top_k]
bbox_collection_numpy = numpy.array(bbox_collection, dtype=numpy.float32)
bbox_collection_numpy = bbox_collection_numpy / resize_scale
if NMS_flag:
final_bboxes = NMS(bbox_collection_numpy, NMS_threshold)
final_bboxes_ = []
for i in range(final_bboxes.shape[0]):
final_bboxes_.append((final_bboxes[i, 0], final_bboxes[i, 1], final_bboxes[i, 2], final_bboxes[i, 3], final_bboxes[i, 4]))
return final_bboxes_
else:
return bbox_collection_numpy
######################################################################
# WHERE SECOND PART ACTUALLY RUNS #
######################################################################
import sys
sys.path.append('..')
from config_farm import configuration_10_320_20L_5scales_v2 as cfg
onnx_file_path = './onnx_files/v2.onnx'
myInference = Inference_TensorRT(
onnx_file_path=onnx_file_path,
receptive_field_list=cfg.param_receptive_field_list,
receptive_field_stride=cfg.param_receptive_field_stride,
bbox_small_list=cfg.param_bbox_small_list,
bbox_large_list=cfg.param_bbox_large_list,
receptive_field_center_start=cfg.param_receptive_field_center_start,
num_output_scales=cfg.param_num_output_scales)
do_inference = myInference.do_inference
test_img_lffd = test_img
# just for debugging
cv2.imwrite('test_img_lffd.jpg', test_img_lffd)
bboxes = do_inference(test_img_lffd, score_threshold=0.6, top_k=1000, NMS_threshold=0.2, NMS_flag=True)
bbox = bboxes[0]
bb_conf = bbox[4]
bb_left = bbox[0]
bb_top = bbox[1]
bb_right = bbox[2]
bb_bottom = bbox[3]
print(f'[lffd](l, t, r, b, c) = ({bb_left}, {bb_top}, {bb_right}, {bb_bottom}, {bb_conf})')
test_img_lffd = cv2.rectangle(test_img_lffd, (bb_left, bb_top), (bb_right, bb_bottom), freq_cv.GREEN, 2)
cv2.imwrite(f'[lffd] detected_lffd.jpg', test_img_lffd)
print(f'[lffd] detected_lffd.jpg written!')
print(f'[lffd] doing face recognition ... ')
# convert to int because LFFD returns numpy.float32 coordinates
css_type_face_location = [(int(bb_top), int(bb_right), int(bb_bottom), int(bb_left))]
print(f'[lffd]css = {css_type_face_location}')
face_encoding = get_face_encodings(test_img_lffd, css_type_face_location, 0)[0] # error showed up at this line
print(f'[lffd]face_encoding:\n{face_encoding}')
"The module pycuda.autoinit, when imported, automatically performs all the steps necessary to get CUDA ready for submission of compute kernels": link
I think dlib and pycuda.autoinit have different memory-handling mechanisms that conflicts with each other, or either is having a silent bug. The easiest way is to sacrifice one of them, but I want to use both LFFD (for detection), which needs pycuda.autoinit and dlib (for recognition), which hates pycuda.autoinit, so I have to do something to "synchronize" them. I haven't figured out how, because my C++ is bad, looking at the cudnn_dlibapi.cpp makes me nearly blind.
Can you help me, or suggest any alternative ideas? I would really really appreciate it :(
I solved this by change the import libs order. keep import pycuda.autoinit before import face_recon in the whole program. I put import pycuda.autoinit in my startup program file and the problem is solved. here is the test program:
from PIL import Image;
import numpy as np;
import face_recognition as fr;
import pycuda.autoinit
image = Image.open("/home/nvidia/image5.jpg")
(width,height)=image.size;
image_np = np.array(image.getdata()).reshape((height,width,3)).astype(np.uint8)
results = fr.face_encodings(image_np)
print(results)
the program above has error:
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /tmp/pip-install-rooy8wlc/dlib/dlib/cuda/cudnn_dlibapi.cpp:1004. code: 7, reason: A call to cuDNN failed
when I change the order imports, the error disappear and the face coding print, here is the code:
from PIL import Image;
import numpy as np;
import pycuda.autoinit
import face_recognition as fr;
image = Image.open("/home/nvidia/image5.jpg")
(width,height)=image.size;
image_np = np.array(image.getdata()).reshape((height,width,3)).astype(np.uint8)
results = fr.face_encodings(image_np)
print(results)
I think the key may be that pycuda.autoinit must be import before face_recognition in the whole program, so you must check the order in all your files.
this is my env: Python 3.6.9 cuda: 10.0 tensorrt: 6.0.1.10
I have got the same error message . but I got this error after along time running the apps. i have created two apps running on same pc doing some different work. please help
I have got the same error message . but I got this error after along time running the apps. i have created two apps running on same pc doing some different work. please help
Have you managed to solve it?
I tried another face detection method which returns bounding boxes values of float type. Then I convert them with int() and feed them to face_recognition.face_encodings() (which I assign as get_face_encoding()). Then I get error detailed as below:
Traceback (most recent call last): File "predict_tensorrt_video.py", line 665, in
main()
File "predict_tensorrt_video.py", line 88, in inner
retval = fnc(*args, **kwargs)
File "predict_tensorrt_video.py", line 659, in main
run_inference(args.video_in, args.video_out, candidate_id, current_time)
File "predict_tensorrt_video.py", line 549, in run_inference
face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in face_encodings
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /home/gate/dlib-19.17/dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed
cudaStreamDestroy() failed. Reason: invalid device ordinal
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaStreamDestroy() failed. Reason: unknown error
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaFree() failed. Reason: invalid device pointer
Segmentation fault (core dumped)
Anyone knows how to solve it? I've struggled with it all day. Thank you a lot!!