SSD Object Detector Dual output

bomerzz commented 2 years ago

Hello,

I've been trying to run the tf2_ssdincepion_v2 model using dpu-pynq on an Ultra96-V2 using a modified version of the dpu_tf_inceptionv1.ipynb file.

By using the defaults of shapeOut = tuple(outputTensors[0].dims) I was able to get an output array of shape (1,1917,91) which I assume is the confidence score for each box.

If I access the second output tensor shapeOut2 = tuple(outputTensors[1].dims) it will give me a shape of (1,1917,4) which I assume will be the box locations.

However while trying to run with the second set of outputensors, the following error will be thrown:

 job_id = dpu.execute_async(input_data, output_data2)
double free or corruption (out)
Aborted (core dumped)

I am not sure how to proceed with doing the detection as from the graph generated using analyze_subgraphs.sh the subgraph has 2 outputs that require CPU processing to obtain the final result.

Any help will be appreciated. Thank you!

Graph Image

skalade commented 2 years ago

Hi there,

I think this is a limitation DpuOverlay, looking at dpu.py the first and only subgraph is used to create a runner object. We might support subgraphs in the future, but for now you would have to use vart directly and run separate instances like here.

Thanks Shawn

bomerzz commented 2 years ago

Hi @skalade thanks for the reply. I've tried the above mentioned method but it will still throw the double free error when attempting to use an output buffer of size of the second output (1,1917,4). From the graph image above it seems that the DPU Subgraph is able to output two different tensors. Is there a way to set the output of the subgraph to be of the second output? Below is the code I used where i set the output_data to the size of the second output. I'm not too sure if this is the correct way to do this.

Thanks!

shapeOut2 = tuple(outputTensors[1].dims)
outputSize2 = int(outputTensors[1].get_data_size() / shapeIn[0])
output_data2 = [np.empty(shapeOut2, dtype=np.float32, order="C")]

dpu_1 = vart.Runner.create_runner(subgraph[0], "run")
dpu_2 = vart.Runner.create_runner(subgraph[0], "run")
job_id = dpu_1.execute_async(input_data,output_data)
dpu_1.wait(job_id)
print("Job 1")
job_id2 = dpu_2.execute_async(input_data,output_data2)
dpu_2.wait(job_id2)
print("Job 2")

skalade commented 2 years ago

Hi, since this does not seem like a DPU-PYNQ bug I'd recommend posting this on the pynq discuss forum or maybe as a more general vart question to the xilinx forums / Vitis AI github issues.

I've not really worked with models like ssd before so can't give too much advice. But parsing your model looks like the 4th CPU subgraph corresponds to the box encodings that you can see on one of your outputs in that graph image you provided. So maybe you could grab those somehow, there should be some examples in the Vitis AI library.

Hope this helps a bit...

I'm going to close this issue because this isn't a core DPU-PYNQ problem. If you still have issues I encourage you to post on one of the forums!

Thanks Shawn

bomerzz commented 2 years ago

Hi @skalade thanks for the help! I'll give the forum a shot.

Xilinx / DPU-PYNQ

SSD Object Detector Dual output #72