Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.42k stars 621 forks source link

KV260 multi-core DPU inference not working (3xB1024) arch #1383

Open Shreyas-NR opened 7 months ago

Shreyas-NR commented 7 months ago

Hi,

I'm working on the PyTorch ResNet50 model, I'm using the Vitis-ai 2.5 version, KV260 platform with Petalinux 2022.1, DPU IP v4 (Vivado flow)

I compiled the ResNet50 model for 1 core B4096 arch, everything works as expected. I also compiled the ResNet50 model for 1 core B1024 arch, everything works as expected.

I have a 3-core B1024 arch platform, I want to run different models on each DPU core. I compiled the ResNet50 model to 1024 arch.-> ResNet50_1024_QAT_kv260.xmodel I'm trying to run the resnet50_mt example examples/vai_runtime/resnet50_mt_py/resnet50.py

I have 3 images in my img directory, my expectation is all three cores should infer 3 images, so I should have 9 outputs.

root@xilinx-kv260-starterkit-20221:~/app/img/image# tree
.
|-- bellpeppe-994958.JPEG
|-- greyfox-672194.JPEG
`-- irishterrier-696543.JPEG

0 directories, 3 files

Below is the log of my output

root@xilinx-kv260-starterkit-20221:~/app/samples# xdputil query
{
    "DPU IP Spec":{
        "DPU Core Count":3,
        "IP version":"v4.0.0",
        "enable softmax":"False"
    },
    "VAI Version":{
        "libvart-runner.so":"Xilinx vart-runner Version: 2.5.0-c26eae36f034d5a2f9b2a7bfe816b8c43311a4f8  2023-01-22-01:10:05 ",
        "libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 2.5.0-c26eae36f034d5a2f9b2a7bfe816b8c43311a4f8  2022-06-15 07:33:00 [UTC] ",
        "libxir.so":"Xilinx xir Version: xir-c26eae36f034d5a2f9b2a7bfe816b8c43311a4f8 2023-01-22-01:08:11",
        "target_factory":"target-factory.2.5.0 c26eae36f034d5a2f9b2a7bfe816b8c43311a4f8"
    },
    "kernels":[
        {
            "DPU Arch":"DPUCZDX8G_ISA1_B1024",
            "DPU Frequency (MHz)":275,
            "cu_idx":0,
            "fingerprint":"0x101000016010402",
            "is_vivado_flow":true,
            "name":"DPU Core 0"
        },
        {
            "DPU Arch":"DPUCZDX8G_ISA1_B1024",
            "DPU Frequency (MHz)":275,
            "cu_idx":1,
            "fingerprint":"0x101000016010402",
            "is_vivado_flow":true,
            "name":"DPU Core 1"
        },
        {
            "DPU Arch":"DPUCZDX8G_ISA1_B1024",
            "DPU Frequency (MHz)":275,
            "cu_idx":2,
            "fingerprint":"0x101000016010402",
            "is_vivado_flow":true,
            "name":"DPU Core 2"
        }
    ]
}
root@xilinx-kv260-starterkit-20221:~/app/samples# python3 resnet50.py 3 ../model/ResNet50_1024_QAT_kv260.xmodel
Top[0] 996 "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa,",
Top[1] 988 "acorn,",
Top[2] 987 "corn,",
Top[3] 971 "bubble,",
Top[4] 964 "potpie,",
Top[0] 996 "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa,",
Top[1] 988 "acorn,",
Top[2] 987 "corn,",
Top[3] 971 "bubble,",
Top[4] 964 "potpie,",
Top[0] 996 "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa,",
Top[1] 988 "acorn,",
Top[2] 987 "corn,",
Top[3] 978 "seashore, coast, seacoast, sea-coast,",
Top[4] 971 "bubble,",
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1218 16:40:18.570039 12571 dpu_controller_dnndk.cpp:190] Check failed: retval == 0 (-1 vs. 0) run dpu failed.
*** Check failure stack trace: ***
F1218 16:40:18.570040 12570 dpu_controller_dnndk.cpp:190] Check failed: retval == 0 (-1 vs. 0) run dpu failed.
*** Check failure stack trace: ***
Aborted
root@xilinx-kv260-starterkit-20221:~/app/samples#

Can anyone please help me?

Shreyas-NR commented 7 months ago

Hi @qianglin-xlnx,

I did some DEBUG ,

I have some updates now,

Platform 1: 1xB1024 DPU

My scenario is 1 image, 1 CPU thread, 1 VART runner, and model-ResNet50 compiled to B1024 arch.

I'm able to see the right inference results

Platform 2: 2xB1024 DPU

My scenario is 2 images(Bell pepper, Coffee Mug), 2 CPU threads, 2 VART runners, and model-ResNet50 compiled to B1024 arch.

Both the VART runner gets the same ResNet50 model, Each runner should infer only one image.

I'm able to see the right inference results.

Below is the log,

root@xilinx-kv260-starterkit-20221:~/app/samples# show_dpu
device_core_id=0 device= 0 core = 0 fingerprint = 0x101000016010402 batch = 1 full_cu_name=unknown:dpu0
device_core_id=1 device= 0 core = 1 fingerprint = 0x101000016010402 batch = 1 full_cu_name=unknown:dpu0

root@xilinx-kv260-starterkit-20221:~/app/samples# python3 resnet50_mt_custom.py
Thread: 281473515405328 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel, 
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_

Thread: 281473515405328 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel, 
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1220 18:10:35.943100 1063781 dpu_controller.cpp:38] add factory method 00_dnndk
I1220 18:10:35.943190 1063781 dpu_controller_dnndk.cpp:255] register the dnndk dpu controller
I1220 18:10:35.943435 1063781 dpu_controller_dnndk.cpp:73]  fingerprint: 0x101000016010402 0x101000016010402
I1220 18:10:35.943467 1063781 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaaadf09e5c0
I1220 18:10:35.943509 1063781 dpu_controller_dnndk.cpp:223] sfm_num 0 dpu_num 2
I1220 18:10:36.125147 1063781 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaaadf09e5c0
Thread: 281473515405328 all_dpu_runners[0] = vart::Runner@0xaaaadd29db10
Thread: 281473515405328 all_dpu_runners[1] = vart::Runner@0xaaaadd2a2780

Thread: 281473207030096 runner             = vart::Runner@0xaaaadd29db10
Thread: 281473198575952 runner             = vart::Runner@0xaaaadd2a2780

I1220 18:10:36.313642 1063797 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:10:36.313735 1063797 dpu_controller_dnndk.cpp:159] code 0x71100000 core_idx 0 gen_reg:  0x71300000 0x72c00000 0x41780000 0x313b000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
I1220 18:10:36.314150 1063798 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:10:36.314215 1063798 dpu_controller_dnndk.cpp:159] code 0x72f00000 core_idx 1 gen_reg:  0x73100000 0x74a00000 0x47b80000 0x10a0e000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
core_idx = 0  LSTART 33436  LEND 33436  CSTART 17945  CEND 17945  SSTART 1430  SEND 1430  MSTART 1550  MEND 1550  CYCLE_L 3951597  CYCLE_H 0  TIMER 180294636297874
Thread: 281473207030096 , Input tensor  : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281473207030096 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
core_idx = 1  LSTART 33436  LEND 33436  CSTART 17945  CEND 17945  SSTART 1430  SEND 1430  MSTART 1550  MEND 1550  CYCLE_L 3955745  CYCLE_H 0  TIMER 180294636805541
Thread: 281473198575952 , Input tensor  : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281473198575952 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
Thread: 281473207030096 , Top[0] 945 0.779940 "bell pepper,",
Thread: 281473207030096 , Top[1] 941 0.286924 "acorn squash,",
Thread: 281473207030096 , Top[2] 943 0.223457 "cucumber, cuke,",
Thread: 281473207030096 , Top[3] 952 0.119608 "fig,",
Thread: 281473207030096 , Top[4] 939 0.087507 "zucchini, courgette,",
----------------------------------------------------------------------------------------------------
Thread: 281473207030096 DONE
Thread: 281473198575952 , Top[0] 504 0.120280 "coffee mug,",
Thread: 281473198575952 , Top[1] 968 0.093674 "cup,",
Thread: 281473198575952 , Top[2] 967 0.093674 "espresso,",
Thread: 281473198575952 , Top[3] 899 0.093674 "water jug,",
Thread: 281473198575952 , Top[4] 969 0.082667 "eggnog,",
----------------------------------------------------------------------------------------------------
Thread: 281473198575952 DONE
FPS=8.50, total frames = 1.00 , time=0.117676 seconds

Platform 3: 3xB1024 DPU

My scenario is 3 images(Bell pepper, Coffee Mug, Grey Fox), 3 CPU threads, 3 VART runners, and model-ResNet50 compiled to B1024 arch.

All 3 VART runner gets the same ResNet50 model, Each runner should infer only one image.

Below is the log,

root@xilinx-kv260-starterkit-20221:~/app/samples# python3 resnet50_mt_custom.py
Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_

Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_

Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1220 18:15:21.002403 1065550 dpu_controller.cpp:38] add factory method 00_dnndk
I1220 18:15:21.002496 1065550 dpu_controller_dnndk.cpp:255] register the dnndk dpu controller
I1220 18:15:21.002745 1065550 dpu_controller_dnndk.cpp:73]  fingerprint: 0x101000016010402 0x101000016010402
I1220 18:15:21.002779 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
I1220 18:15:21.002821 1065550 dpu_controller_dnndk.cpp:223] sfm_num 0 dpu_num 3
I1220 18:15:21.153385 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
I1220 18:15:21.302698 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
Thread: 281473147834384 all_dpu_runners[0] = vart::Runner@0xaaab17d35dd0
Thread: 281473147834384 all_dpu_runners[1] = vart::Runner@0xaaab15f50ad0
Thread: 281473147834384 all_dpu_runners[2] = vart::Runner@0xaaab15e19710

Thread: 281472835793232 runner             = vart::Runner@0xaaab17d35dd0
Thread: 281472744681808 runner             = vart::Runner@0xaaab15f50ad0
I1220 18:15:21.505937 1065566 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.506038 1065566 dpu_controller_dnndk.cpp:159] code 0x71100000 core_idx 0 gen_reg:  0x71300000 0x72c00000 0x47b80000 0xf290000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
Thread: 281472736227664 runner             = vart::Runner@0xaaab15e19710
I1220 18:15:21.506896 1065567 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.507059 1065567 dpu_controller_dnndk.cpp:159] code 0x72f00000 core_idx 1 gen_reg:  0x73100000 0x74a00000 0x47bc0000 0x314e000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
I1220 18:15:21.507150 1065568 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.507253 1065568 dpu_controller_dnndk.cpp:159] code 0x74d00000 core_idx 2 gen_reg:  0x74f00000 0x76800000 0x48740000 0x4614000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
core_idx = 0  LSTART 33436  LEND 33436  CSTART 17945  CEND 17945  SSTART 1430  SEND 1430  MSTART 1550  MEND 1550  CYCLE_L 3937865  CYCLE_H 0  TIMER 180579828604250
Thread: 281472835793232 , Input tensor  : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281472835793232 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
Thread: 281472835793232 , Top[0] 996 0.041342 "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa,",
Thread: 281472835793232 , Top[1] 988 0.041342 "acorn,",
Thread: 281472835793232 , Top[2] 987 0.041342 "corn,",
Thread: 281472835793232 , Top[3] 971 0.041342 "bubble,",
Thread: 281472835793232 , Top[4] 964 0.041342 "potpie,",
----------------------------------------------------------------------------------------------------
Thread: 281472835793232 DONE
core_idx = core_idx = 21   LSTART 0  LEND 0  CSTART 0  CEND 0  SSTART 0  SEND 0  MSTART 0  MEND 0  CYCLE_L 0  CYCLE_H 0  TIMER 180579829823136  LSTART 0  LEND 0  CSTART 0  CEND 0  SSTART 0  SEND 0  MSTART 0  MEND 0  CYCLE_L 0  CYCLE_H 0  TIMER 180579829747255

F1220 18:15:26.602097 1065567 dpu_controller_dnndk.cpp:190] Check failed: retval == 0 (-1 vs. 0) run dpu failed.
*** Check failure stack trace: ***
Aborted
root@xilinx-kv260-starterkit-20221:~/app/samples#

Any help is appreciated

Best regards, Shreyas