Deploying YOLOv8 to ARTPEC-8 cameras

philippe-heitzmann commented 1 year ago

Can ARTPEC-8 cameras run YOLOv8?

Could the Axis team please advise if ARTPEC-8 DLPU chips are able to support running int8 per-tensor quantized YOLOv8m? For reference our team is seeing model loading errors when attempting to load this model to the inference server leveraging the object_detector_python script, with these errors pointing to the model containing > max 16 graph partitions and certain operations not supported on ARTPEC-8 cameras (see docs for full overview of YOLOv8 architecture / layers), which we were curious to learn more about why this would break. Any pointers on this would be much appreciated if possible, thank you.

inference-server_1        | ERROR in Inference: Failed to load model yolov8m_int8.tflite (Could not load model: Model contains too many graph partitions (137 > 16) and 68 of the graph partitions can't be run on the device. Consider redesigning the model to better utilize the device. (Is it fully integer quantized? Is it using non-supported operations?))

To reproduce

Export YOLOv8m to int8 per-tensor quantized .tflite weights using exporter.py script made available here Quantization parameters:
- int8
- per-tensor
Deploy exported .tflite weights to camera

Run docker-compose with below command leveraging object_detector_python scripts

docker compose --env-file ./config/env.$ARCH.$CHIP up
# env file:
ARCH=aarch64
CHIP=artpec8

Observe above failed model loading errors

Environment

Axis device model: AXIS P3267-LVE Dome Camera
Axis device firmware version: 11.3.70
Stack trace or logs: Console output when running
OS and version: [e.g. macOS v13.2.1]
Client and server application scripts: object_detector_python

Corallo commented 1 year ago

Hello @philippe-heitzmann

Can you show the command that you used to run the export.py script and produce a tflite quantized per tensor? Could you maybe also share the tflite that you get from the conversion? That error is introduced to prevent users to use a model that has many nodes (>16) that fall back to the CPU in the middle of the graph. Since falling back to the CPU has a big time overhead, doing it more than ~16 times would make it so inefficient to make it not worth to use the accelerator, but rather run it directly on the CPU. In your case, after the quantization, your graph have many of these nodes. It is not clear from the architecture description what is the problem specifically, you would need to look at the tflite and see if there are some floating point layers.

philippe-heitzmann commented 1 year ago

Hi @Corallo yes definitely please see below for code used for exporting this YOLOv8m model, and link for the model weights outputted from the below:

# running using YOLOv8 Dockerfile (https://github.com/ultralytics/ultralytics/blob/main/docker/Dockerfile)
from ultralytics import YOLO

model = YOLO('/weights/yolov8m.pt')
# Export to int8, uses per-tensor quantization using onnx2tf module automatically 
# adds `-oiqt -qt per-tensor` flags to onnx2tf command, see onnx2tf repo (https://github.com/PINTO0309/onnx2tf)
model.export(format='tflite', int8=True)

As it seems the error message is indicating there are ~68 graph partitions with unsupported operations in the model graph, we were wondering if the Axis team could please advise if any intuition on which of these layers may be problematic in this case, especially in the context of previous reports such as #112 of YOLOv5s being able to run (albeit slowly) on ARPTEC-8 DLPU chips, given both of these v5 & v8 models would use mostly identical types of convolutions / operations etc. Any pointers on this would be much appreciated if possible, thank you.

Corallo commented 1 year ago

Hello @philippe-heitzmann

Runningjournalctl -u larodafter trying to load the model (and failing) gives you more info about the problem with the model. Specifically, running it after loading your model you'll see Apr 13 11:17:43 axis-b8a44f277efe sh[1156]: ERROR: hybrid data type is not supported in conv2d. This means that the conv2d are quantized only in the kernel parameters, but the convolution excepts input and produce outputs as float. This is not supported. unquantized_model Besides, looking at your model with netron you can see that other layers are not quantized, like the Add and Mul nodes, this will make the execution fall back to the cpu after each convolution, and that's why you see that error saying that the graph is divided in 60+ pieces.

This seems to be a problem with onnx2tf, I am not sure if it has a flag to ask to quantize not only the filters, but also everything else.

Corallo commented 1 year ago

@philippe-heitzmann I took another look at the onnx2tf tool, but I tried it on yolov5, maybe this will still apply to you: Running onnx2tf -i ./yolov5s.onnx -oiqt -qt per-tensor -ioqd uint8 produce several tflite models

yolov5s_relu_dynamic_range_quant.tflite
yolov5s_relu_float16.tflite
yolov5s_relu_float32.tflite
yolov5s_relu_full_integer_quant.tflite
yolov5s_relu_full_integer_quant_with_int16_act.tflite
yolov5s_relu_integer_quant.tflite
yolov5s_relu_integer_quant_with_int16_act.tflite

The model with the correct quantization is yolov5s_relu_full_integer_quant.tflite Maybe you can try the same, exporting the model from yolov8 to onnx and quantizing by yourself with onnx2tf.

Let us know how that works, we would be happy to hear that you succeeded.

egSat commented 1 year ago

Hi Corallo. Thank you very much for your efforts. I have tried it with yolov7 from https://github.com/WongKinYiu/yolov7 but it fails. First I exported it to onnx with the following command: python3.8 export.py --weights yolov7.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640

Then I converted it with your command: onnx2tf -i yolov7.onnx -oiqt -qt per-tensor -ioqd uint8 I picked up yolov7_full_integer_quant.tflite but when loading to the camera fails, saying: ERROR in Inference: Failed to load model yolov7_full_integer_quant.tflite (Could not load model: Could not build an interpreter of the model)

From the journactl of larod the log is more concise: Didn't find op for builtin opcode 'MUL' version '5'. An older version of this builtin might be supported. Are you using an old TFLite binary with a newer model?

EDIT: the same failure happens with yolov7-tiny

AxisCommunications / acap-computer-vision-sdk-examples