Closed philippe-heitzmann closed 1 year ago
Hello @philippe-heitzmann
Can you show the command that you used to run the export.py script and produce a tflite quantized per tensor? Could you maybe also share the tflite that you get from the conversion? That error is introduced to prevent users to use a model that has many nodes (>16) that fall back to the CPU in the middle of the graph. Since falling back to the CPU has a big time overhead, doing it more than ~16 times would make it so inefficient to make it not worth to use the accelerator, but rather run it directly on the CPU. In your case, after the quantization, your graph have many of these nodes. It is not clear from the architecture description what is the problem specifically, you would need to look at the tflite and see if there are some floating point layers.
Hi @Corallo yes definitely please see below for code used for exporting this YOLOv8m model, and link for the model weights outputted from the below:
# running using YOLOv8 Dockerfile (https://github.com/ultralytics/ultralytics/blob/main/docker/Dockerfile)
from ultralytics import YOLO
model = YOLO('/weights/yolov8m.pt')
# Export to int8, uses per-tensor quantization using onnx2tf module automatically
# adds `-oiqt -qt per-tensor` flags to onnx2tf command, see onnx2tf repo (https://github.com/PINTO0309/onnx2tf)
model.export(format='tflite', int8=True)
As it seems the error message is indicating there are ~68 graph partitions with unsupported operations in the model graph, we were wondering if the Axis team could please advise if any intuition on which of these layers may be problematic in this case, especially in the context of previous reports such as #112 of YOLOv5s being able to run (albeit slowly) on ARPTEC-8 DLPU chips, given both of these v5 & v8 models would use mostly identical types of convolutions / operations etc. Any pointers on this would be much appreciated if possible, thank you.
Hello @philippe-heitzmann
Runningjournalctl -u larod
after trying to load the model (and failing) gives you more info about the problem with the model.
Specifically, running it after loading your model you'll see
Apr 13 11:17:43 axis-b8a44f277efe sh[1156]: ERROR: hybrid data type is not supported in conv2d.
This means that the conv2d are quantized only in the kernel parameters, but the convolution excepts input and produce outputs as float. This is not supported.
Besides, looking at your model with netron you can see that other layers are not quantized, like the Add and Mul nodes, this will make the execution fall back to the cpu after each convolution, and that's why you see that error saying that the graph is divided in 60+ pieces.
This seems to be a problem with onnx2tf, I am not sure if it has a flag to ask to quantize not only the filters, but also everything else.
@philippe-heitzmann
I took another look at the onnx2tf tool, but I tried it on yolov5, maybe this will still apply to you:
Running
onnx2tf -i ./yolov5s.onnx -oiqt -qt per-tensor -ioqd uint8
produce several tflite models
yolov5s_relu_dynamic_range_quant.tflite
yolov5s_relu_float16.tflite
yolov5s_relu_float32.tflite
yolov5s_relu_full_integer_quant.tflite
yolov5s_relu_full_integer_quant_with_int16_act.tflite
yolov5s_relu_integer_quant.tflite
yolov5s_relu_integer_quant_with_int16_act.tflite
The model with the correct quantization is yolov5s_relu_full_integer_quant.tflite
Maybe you can try the same, exporting the model from yolov8 to onnx and quantizing by yourself with onnx2tf.
Let us know how that works, we would be happy to hear that you succeeded.
Hi Corallo. Thank you very much for your efforts. I have tried it with yolov7 from https://github.com/WongKinYiu/yolov7 but it fails. First I exported it to onnx with the following command: python3.8 export.py --weights yolov7.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
Then I converted it with your command: onnx2tf -i yolov7.onnx -oiqt -qt per-tensor -ioqd uint8 I picked up yolov7_full_integer_quant.tflite but when loading to the camera fails, saying: ERROR in Inference: Failed to load model yolov7_full_integer_quant.tflite (Could not load model: Could not build an interpreter of the model)
From the journactl of larod the log is more concise: Didn't find op for builtin opcode 'MUL' version '5'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?
EDIT: the same failure happens with yolov7-tiny
Can ARTPEC-8 cameras run YOLOv8?
Could the Axis team please advise if ARTPEC-8 DLPU chips are able to support running int8 per-tensor quantized YOLOv8m? For reference our team is seeing model loading errors when attempting to load this model to the inference server leveraging the object_detector_python script, with these errors pointing to the model containing > max 16 graph partitions and certain operations not supported on ARTPEC-8 cameras (see docs for full overview of YOLOv8 architecture / layers), which we were curious to learn more about why this would break. Any pointers on this would be much appreciated if possible, thank you.
To reproduce
Environment