Open Shreyas-NR opened 7 months ago
Hi @qianglin-xlnx,
I did some DEBUG ,
I have some updates now,
Platform 1: 1xB1024 DPU
My scenario is 1 image, 1 CPU thread, 1 VART runner, and model-ResNet50 compiled to B1024 arch.
I'm able to see the right inference results
Platform 2: 2xB1024 DPU
My scenario is 2 images(Bell pepper, Coffee Mug), 2 CPU threads, 2 VART runners, and model-ResNet50 compiled to B1024 arch.
Both the VART runner gets the same ResNet50 model, Each runner should infer only one image.
I'm able to see the right inference results.
Below is the log,
root@xilinx-kv260-starterkit-20221:~/app/samples# show_dpu
device_core_id=0 device= 0 core = 0 fingerprint = 0x101000016010402 batch = 1 full_cu_name=unknown:dpu0
device_core_id=1 device= 0 core = 1 fingerprint = 0x101000016010402 batch = 1 full_cu_name=unknown:dpu0
root@xilinx-kv260-starterkit-20221:~/app/samples# python3 resnet50_mt_custom.py
Thread: 281473515405328 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel,
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_
Thread: 281473515405328 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel,
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1220 18:10:35.943100 1063781 dpu_controller.cpp:38] add factory method 00_dnndk
I1220 18:10:35.943190 1063781 dpu_controller_dnndk.cpp:255] register the dnndk dpu controller
I1220 18:10:35.943435 1063781 dpu_controller_dnndk.cpp:73] fingerprint: 0x101000016010402 0x101000016010402
I1220 18:10:35.943467 1063781 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaaadf09e5c0
I1220 18:10:35.943509 1063781 dpu_controller_dnndk.cpp:223] sfm_num 0 dpu_num 2
I1220 18:10:36.125147 1063781 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaaadf09e5c0
Thread: 281473515405328 all_dpu_runners[0] = vart::Runner@0xaaaadd29db10
Thread: 281473515405328 all_dpu_runners[1] = vart::Runner@0xaaaadd2a2780
Thread: 281473207030096 runner = vart::Runner@0xaaaadd29db10
Thread: 281473198575952 runner = vart::Runner@0xaaaadd2a2780
I1220 18:10:36.313642 1063797 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:10:36.313735 1063797 dpu_controller_dnndk.cpp:159] code 0x71100000 core_idx 0 gen_reg: 0x71300000 0x72c00000 0x41780000 0x313b000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
I1220 18:10:36.314150 1063798 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:10:36.314215 1063798 dpu_controller_dnndk.cpp:159] code 0x72f00000 core_idx 1 gen_reg: 0x73100000 0x74a00000 0x47b80000 0x10a0e000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
core_idx = 0 LSTART 33436 LEND 33436 CSTART 17945 CEND 17945 SSTART 1430 SEND 1430 MSTART 1550 MEND 1550 CYCLE_L 3951597 CYCLE_H 0 TIMER 180294636297874
Thread: 281473207030096 , Input tensor : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281473207030096 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
core_idx = 1 LSTART 33436 LEND 33436 CSTART 17945 CEND 17945 SSTART 1430 SEND 1430 MSTART 1550 MEND 1550 CYCLE_L 3955745 CYCLE_H 0 TIMER 180294636805541
Thread: 281473198575952 , Input tensor : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281473198575952 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
Thread: 281473207030096 , Top[0] 945 0.779940 "bell pepper,",
Thread: 281473207030096 , Top[1] 941 0.286924 "acorn squash,",
Thread: 281473207030096 , Top[2] 943 0.223457 "cucumber, cuke,",
Thread: 281473207030096 , Top[3] 952 0.119608 "fig,",
Thread: 281473207030096 , Top[4] 939 0.087507 "zucchini, courgette,",
----------------------------------------------------------------------------------------------------
Thread: 281473207030096 DONE
Thread: 281473198575952 , Top[0] 504 0.120280 "coffee mug,",
Thread: 281473198575952 , Top[1] 968 0.093674 "cup,",
Thread: 281473198575952 , Top[2] 967 0.093674 "espresso,",
Thread: 281473198575952 , Top[3] 899 0.093674 "water jug,",
Thread: 281473198575952 , Top[4] 969 0.082667 "eggnog,",
----------------------------------------------------------------------------------------------------
Thread: 281473198575952 DONE
FPS=8.50, total frames = 1.00 , time=0.117676 seconds
Platform 3: 3xB1024 DPU
My scenario is 3 images(Bell pepper, Coffee Mug, Grey Fox), 3 CPU threads, 3 VART runners, and model-ResNet50 compiled to B1024 arch.
All 3 VART runner gets the same ResNet50 model, Each runner should infer only one image.
Below is the log,
root@xilinx-kv260-starterkit-20221:~/app/samples# python3 resnet50_mt_custom.py
Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_
Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_
Thread: 281473147834384 xmodel = /home/root/app/model/ResNet50_1024_QAT_kv260.xmodel
sg[0] = subgraph_ResNet__ResNet_QuantStub_quant_stub__input_1
sg[1] = subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
sg[2] = subgraph_ResNet__ResNet_Linear_fc__8088_fix_
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1220 18:15:21.002403 1065550 dpu_controller.cpp:38] add factory method 00_dnndk
I1220 18:15:21.002496 1065550 dpu_controller_dnndk.cpp:255] register the dnndk dpu controller
I1220 18:15:21.002745 1065550 dpu_controller_dnndk.cpp:73] fingerprint: 0x101000016010402 0x101000016010402
I1220 18:15:21.002779 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
I1220 18:15:21.002821 1065550 dpu_controller_dnndk.cpp:223] sfm_num 0 dpu_num 3
I1220 18:15:21.153385 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
I1220 18:15:21.302698 1065550 dpu_controller.cpp:49] create dpu controller via 00_dnndk ret= 0xaaab17d35d50
Thread: 281473147834384 all_dpu_runners[0] = vart::Runner@0xaaab17d35dd0
Thread: 281473147834384 all_dpu_runners[1] = vart::Runner@0xaaab15f50ad0
Thread: 281473147834384 all_dpu_runners[2] = vart::Runner@0xaaab15e19710
Thread: 281472835793232 runner = vart::Runner@0xaaab17d35dd0
Thread: 281472744681808 runner = vart::Runner@0xaaab15f50ad0
I1220 18:15:21.505937 1065566 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.506038 1065566 dpu_controller_dnndk.cpp:159] code 0x71100000 core_idx 0 gen_reg: 0x71300000 0x72c00000 0x47b80000 0xf290000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
Thread: 281472736227664 runner = vart::Runner@0xaaab15e19710
I1220 18:15:21.506896 1065567 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.507059 1065567 dpu_controller_dnndk.cpp:159] code 0x72f00000 core_idx 1 gen_reg: 0x73100000 0x74a00000 0x47bc0000 0x314e000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
I1220 18:15:21.507150 1065568 dpu_runner_base_imp.cpp:634] subgraph name : subgraph_ResNet__ResNet_AvgPool2d_avgpool__8077_i0
I1220 18:15:21.507253 1065568 dpu_controller_dnndk.cpp:159] code 0x74d00000 core_idx 2 gen_reg: 0x74f00000 0x76800000 0x48740000 0x4614000 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
core_idx = 0 LSTART 33436 LEND 33436 CSTART 17945 CEND 17945 SSTART 1430 SEND 1430 MSTART 1550 MEND 1550 CYCLE_L 3937865 CYCLE_H 0 TIMER 180579828604250
Thread: 281472835793232 , Input tensor : ResNet__ResNet_QuantStub_quant_stub__input_1_fix [1, 224, 224, 3]
Thread: 281472835793232 , Output tensor : ResNet__ResNet_Linear_fc__8088_fix [1, 1000]
Thread: 281472835793232 , Top[0] 996 0.041342 "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa,",
Thread: 281472835793232 , Top[1] 988 0.041342 "acorn,",
Thread: 281472835793232 , Top[2] 987 0.041342 "corn,",
Thread: 281472835793232 , Top[3] 971 0.041342 "bubble,",
Thread: 281472835793232 , Top[4] 964 0.041342 "potpie,",
----------------------------------------------------------------------------------------------------
Thread: 281472835793232 DONE
core_idx = core_idx = 21 LSTART 0 LEND 0 CSTART 0 CEND 0 SSTART 0 SEND 0 MSTART 0 MEND 0 CYCLE_L 0 CYCLE_H 0 TIMER 180579829823136 LSTART 0 LEND 0 CSTART 0 CEND 0 SSTART 0 SEND 0 MSTART 0 MEND 0 CYCLE_L 0 CYCLE_H 0 TIMER 180579829747255
F1220 18:15:26.602097 1065567 dpu_controller_dnndk.cpp:190] Check failed: retval == 0 (-1 vs. 0) run dpu failed.
*** Check failure stack trace: ***
Aborted
root@xilinx-kv260-starterkit-20221:~/app/samples#
Any help is appreciated
Best regards, Shreyas
Hi,
I'm working on the PyTorch ResNet50 model, I'm using the Vitis-ai 2.5 version, KV260 platform with Petalinux 2022.1, DPU IP v4 (Vivado flow)
I compiled the ResNet50 model for 1 core B4096 arch, everything works as expected. I also compiled the ResNet50 model for 1 core B1024 arch, everything works as expected.
I have a 3-core B1024 arch platform, I want to run different models on each DPU core. I compiled the ResNet50 model to 1024 arch.-> ResNet50_1024_QAT_kv260.xmodel I'm trying to run the resnet50_mt example examples/vai_runtime/resnet50_mt_py/resnet50.py
I have 3 images in my img directory, my expectation is all three cores should infer 3 images, so I should have 9 outputs.
Below is the log of my output
Can anyone please help me?