Is dpu runner pinned and sticked to one DPU core internally?

Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.

https://www.xilinx.com/ai

Apache License 2.0

1.49k stars 628 forks source link

Is dpu runner pinned and sticked to one DPU core internally? #184

Closed tetsuro0907 closed 2 years ago

tetsuro0907 commented 4 years ago

Hi, I’d like to tune performance on Alveo U200, but it looks like some basic information inside the dpu runner is missing from the document.

On Alveo U200/250, 2/4 dpus are created respectively, but when does Vitis-AI decide which dpu core to run inference? Is it decided statically when you create_dpu_runner() or dynamically when you exec_async()?

kevinkit commented 4 years ago

AFAIK you have to assign a thread / Runner for each dpu core. https://github.com/Xilinx/Vitis-AI-Tutorials/blob/VAI-KERAS-FCN8-SEMSEG/files/target_zcu104/unet/v2/src/fps_main.cc Here is an older exmaple code that usese several threads. I do not have more information, but maybe this will help you.

Also a python example: https://github.com/Xilinx/Vitis-AI-Tutorials/blob/CIFAR10-Classification-with-TensorFlow/files/target/cifar10_app.py

Basically, more DPUS do not automatically mean more throughput if you do not adept your code accordingly. You must write your own code to do the work balance

tetsuro0907 commented 4 years ago

kevinkit:

AFAICS those examples use DNNDK APIs for edge devices. It looks like that DNNDK has dpuSetTaskAffinity() to let users increase DPU utilization and furthermore the scheduler for DNNDK is open in the repository(yay!).

However, my board is Alveo U200, a cloud (non-edge) device so I have to use dpu runner API, where there is no such dpu affinity API, and AFAICS the scheduler under the dpu runner is not open :( To write my own code to do the work balance, I won’t say please make it full open, but I’d like to know some basic info like when the dpu core is assigned; is it when you create runner and sticked to one or dynamically decided with some cool algorithm when you execute the task?

kevinkit commented 4 years ago

I think DNNDK is outdated and the old API and it was told to me several times that the DNNDK API should be abandoned, sadly most of the Examples use DNNDK - the whole documentation / Process is just a whole mess. There is no straight-forward described way to do things when you are little bit out of the pre built examples or have a different board ...

I am not sure about anything here to be honest, I just started fpga deveopment in this area

tetsuro0907 commented 4 years ago

@kevinkit: I see. Thanks for your response. Okay, I'd like to keep my questions open.