a way converting YOLOv7-tiny to tflite int8 and run successfully on PC and edge device, but still has some question

t109598032 commented 1 year ago

hi, all, I provide a way to conver yolov7-tiny to tflite int8 (full integer quantization)model, and run successfully with correct bbox, but may still need improvement.

Using python export.py --simplify --grid --end2end will generate .onnx including backbone+yolo layer+post-process. Then convert it to tflite fp32, int8 model by onnx2tf , the fp32 model works fine with correct bbox, but int8 model fails even the output value are all almost equals to 0. Watching the operator framework through Netron website, we can find that tflite fp32 and int8 model has one thing different. I dont know why int8 model is added Quantize op.(even using official onnx_tf tool will get same problem), maybe it is the reason causing value=0.

So, convert the yolov7-tiny.pt to .onnx file with only --simplify(without --grid、--end2end), as i know that will output only backbone+yolo layer part only, without postprocess. Then convert to tflite int8, with just the python code implemented post-process like following:

interpreter.invoke()
output_data_0 = interpreter.get_tensor(output_details[0]['index'])  #(1, 3, 13, 13, 85)
output_data_1 = interpreter.get_tensor(output_details[1]['index'])  #(1, 3, 52, 52, 85)
output_data_2 = interpreter.get_tensor(output_details[2]['index'])  #(1, 3, 26, 26, 85)

# int8 dequantization
if output_type == np.int8 or output_type == np.uint8:
    scale, zero_point = output_details[0]['quantization']
    output_data_0 = (output_data_0.astype(np.float32)-zero_point) * scale

    scale, zero_point = output_details[1]['quantization']
    output_data_1 = (output_data_1.astype(np.float32)-zero_point) * scale

    scale, zero_point = output_details[2]['quantization']
    output_data_2 = (output_data_2.astype(np.float32)-zero_point) * scale

# mind the tensor shape order
temp = output_data_0
output_data_0 = output_data_1
output_data_1 = output_data_2
output_data_2 = temp
print(output_data_0.shape)  #(1, 3, 52, 52, 85)
print(output_data_1.shape)  #(1, 3, 26, 26, 85)
print(output_data_2.shape)  #(1, 3, 13, 13, 85)

def sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    return s

x_1 = sigmoid(output_data_0)
x_1_1 = x_1[:,:,:,:,0:2]*16 + np.load("tensor01.npy")
x_1_2 = np.power(x_1[:,:,:,:,2:4], 2) * np.array([[[[[48,64]]],[[[76,144]]],[[[160,112]]]]])
x_1_3 = x_1[:,:,:,:,4:85]
x_1 = np.concatenate([x_1_1, x_1_2, x_1_3], axis=4).reshape((1,8112,85))

x_2 = sigmoid(output_data_1)
x_2_1 = x_2[:,:,:,:,0:2]*32 + np.load("tensor02.npy")
x_2_2 = np.power(x_2[:,:,:,:,2:4], 2) * np.array([[[[[144,300]]],[[[304,220]]],[[[288,584]]]]])
x_2_3 = x_2[:,:,:,:,4:85]
x_2 = np.concatenate([x_2_1, x_2_2, x_2_3], axis=4).reshape((1,2028,85))

x_3 = sigmoid(output_data_2)
x_3_1 = x_3[:,:,:,:,0:2]*64 + np.load("tensor03.npy")
x_3_2 = np.power(x_3[:,:,:,:,2:4], 2) * np.array([[[[[568,440]]],[[[768,972]]],[[[1836,1604]]]]])
x_3_3 = x_3[:,:,:,:,4:85]
x_3 = np.concatenate([x_3_1, x_3_2, x_3_3], axis=4).reshape((1,507,85))

output_data = np.concatenate([x_1, x_2, x_3], axis=1)   #(1, 10647, 85)

pred = non_max_suppression(output_data, conf_thres=0.25, iou_thres=0.45)    #[[x, 7]]

for i, (x0,y0,x1,y1,score,cls_id) in enumerate(pred[0]):
    box = np.array([x0,y0,x1,y1])
    box -= np.array(dwdh*2)
    box /= ratio
    box = box.round().astype(np.int32).tolist()
    cls_id = int(cls_id)
    score = round(float(score),3)
    name = names[cls_id]
    color = colors[name]
    name += ' '+str(score)
    cv2.rectangle(img_result,(int(box[0]),int(box[1])),(int(box[2]),int(box[3])),color,2)
    cv2.putText(img_result,name,(box[0], box[1] - 2),cv2.FONT_HERSHEY_SIMPLEX,0.75,[225, 255, 255],thickness=2)

so far you can see the object bbox correctly on your image. to me, use the method above i successfully run my own trained model on PC and i.MX with vx_delegate acceleration, with speed 15s/image and 0.021s/image respectively by 224x224 model.(tflite model is good at arm64, but bad for x86 right?) tensor_01.npy~tensor_03.npy can get from 3 add op node of post-process part through .onnx with full post-process (using Netron website, just mind the tensor shape order).

This looks a way to solve tflite int8 value=0 problem, but obviously not robust, especially when you change the model input shape, using own trained model, the tensor_01~03.npy has to change(what are these value mean?), and the [[[[[144,300]]],[[[304,220]]],[[[288,584]]]]] etc. also need to change(are these number called anchor?). Anyone has a better solution?

thnak commented 1 year ago

you can try this i wrote it same as YOLOv5, and the tflite model after exported can use same as YOLOv5

t109598032 commented 1 year ago

hi, thanks for your reply, i have tried it, the YOLOv7 tflite int8 works fine on PC, but on i.MX8, with cpu it works fine, while with vx delegate(gpu acceleration) the output turn out to be wrong. It is the same result when i tried YOLOv5 model previously. That is wierd. Same code and same model file, just different inference hardware, maybe it could be the problem of the machine itself i think. same problme here

WongKinYiu / yolov7

a way converting YOLOv7-tiny to tflite int8 and run successfully on PC and edge device, but still has some question #1862