Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.47k stars 630 forks source link

Caffe/Pytorch facerec_resnet20 model fingerprint error Vitis 2.5 #1100

Closed dextroza closed 1 year ago

dextroza commented 1 year ago

Hi,

I am trying to run facerec_resnet20 on my custom board image but I got errors due to model fingerprint mismatch. dpu_runner_base_imp.cpp:676] CHECK fingerprint fail ! model_fingerprint 0x1000020f6014407 dpu_fingerprint 0x101000016010407 F1201 08:07:19.906725 842 dpu_runner_base_imp.cpp:648] fingerprint check failure.

First, I tried Caffe .xmodel from model_zoo - cf_facerec-resnet20_112_96_3.5G_2.5 and got the error above.

Then, I tried Pytorch .xmodel from model_zoo - pt_facerec-resnet20_mixed_112_96_3.5G_2.5 and got exactly the same error as above.

After that, I tried to compile the same Pytorch model from model zoo for my DPU architecture, but I also got the same error as above which is very weird.

Compile cmd: vai_c_xir -x quantized/Resnet_int.xmodel -a arch.json -o facerec_pretrained_resnet20 -n facerec_pretrained_resnet20 where arch.json contains: {"fingerprint":"0x101000016010407"}

Why DPU cannot recognize that my face_rec_resnet20 model is compiled for "0x101000016010407"? I have other several DNN models which work very well on the same DPU architecture.

My environment:

lishixlnx commented 1 year ago

This sounds wired. if you use the correct fingerprint, how and where does the wrong fingerprint come from? Please double check your steps, and also, do same with another correct model just for comparing.

dextroza commented 1 year ago

@lishixlnx thank you for your reply.

The model initialization is successful, but the inference causes the fingerprint mismatch. I use these functions for init and inference:

std::unique_ptr<vitis::ai::FaceRecog> faceFeaturesModel;
faceFeaturesModel = vitis::ai::FaceRecog::create("facerec_resnet20"); 

faceFeaturesModel->run_fixed(
              inputImage(rectPair.first),
              rectPair.second.x,
              rectPair.second.y,
              rectPair.second.width,
              rectPair.second.height);

When I rename the model directory on the sd card from "facerec_resnet20" to "test_facerec_resnet20" on purpose, the initialization fails with message: "cannot find model after checking following dir..." so I am pretty sure that I use correct model.

Just to mention, the facerec_resnet20 directory contains:

Do you have any suggestions how to solve this issue?

Any help would be appreciated.

Edit: SOLVED. The problem was not facerec_resnet20, but wrong version of face landmark which comes first in pipeline of face recognition algorithm.

lishixlnx commented 1 year ago

I tried the model zoo model. it works well. and the output is different than yours.

root@xilinx-zcu102-2021_1:~/pp/facerec-resnet20_mixed_pt# xdputil xmodel ./facerec-resnet20_mixed_pt.xmodel -l { "subgraphs":[ { "name":"subgraph_Resnetinput_0", "device":"USER" }, { "name":"subgraph_ResnetResnet_Linear_448(TransferMatMulToConv2d)", "device":"DPU", "fingerprint":"0x101000016010407",

you can see, the md5 of the model is 0x101000016010407.

lishixlnx commented 1 year ago

Glad to hear you solved it. can you please close this issue? thanks.