Xilinx / Vitis-AI

Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
https://www.xilinx.com/ai
Apache License 2.0
1.5k stars 635 forks source link

Unable to read multiple outputs in one subgraph. #1067

Closed Shreyas-NR closed 2 years ago

Shreyas-NR commented 2 years ago

Hi, I have a compiled model that has 1 DPU subgraph taking 1 input tensor and 3 output tensors.


inputTensor.name = A2J_model__A2J_model_ResNetBackBone_Backbone__input_1_swim_transpose_0_fix
inputTensor.dims = [1, 288, 288, 3]
inputTensor.dtype = xint8

outputTensor.name = A2J_model__A2J_model_ClassificationModel_classificationModel__Conv2d_output__11586_fix
outputTensor.dims = [1, 18, 18, 240]
outputTensor.dtype = xint8

outputTensor.name = A2J_model__A2J_model_DepthRegressionModel_DepthRegressionModel__Conv2d_output__11920_fix
outputTensor.dims = [1, 18, 18, 240]
outputTensor.dtype = xint8

outputTensor.name = A2J_model__A2J_model_RegressionModel_regressionModel__Conv2d_output__11749_fix
outputTensor.dims = [1, 18, 18, 480]
outputTensor.dtype = xint8

I referred to the below examples to write my application code

  1. https://github.com/Xilinx/Vitis-AI-Tutorials/blob/1.4/Design_Tutorials/11-tf2_var_autoenc/files/application/app_mt.py

  2. https://support.xilinx.com/s/article/Multiple-output-model-example-design-from-training-to-application-on-ZCU102?language=en_US

For every different input, I'm unable to get the updated tensor values at the 3 outputs.

This is my dpu runner code snippet,

global out_q1, out_q2, out_q3
out_q1 = [] * n_of_images
out_q2 = [] * n_of_images
out_q3 = [] * n_of_images
-----------------------------------------------------------------------------------------------------------------------------------------

def runThread(id, start, dpu_runner, img):
    '''
    Thread worker function
    '''

    #  Set up encoder DPU runner buffers & I/O mapping dictionary
    global a2j_dict, inbuffer, outbuffer
    a2j_dict, inbuffer, outbuffer = init_dpu_runner(dpu_runner)
    # batchsize
    batchSize = a2j_dict['A2J_model__A2J_model_ResNetBackBone_Backbone__input_1_swim_transpose_0_fix'].shape[0]

    # set runSize
    n_of_images = len(img)
    count = 0
    write_index = start

    # loop over image list
    while count < n_of_images:
        if (count+batchSize<=n_of_images):
            runSize = batchSize
        else:
            runSize=n_of_images-count

        '''
        initialise input and execute DPU runner
        '''
        # init input image to input buffer
        for j in range(runSize):
            imageRun = a2j_dict['A2J_model__A2J_model_ResNetBackBone_Backbone__input_1_swim_transpose_0_fix']
            imageRun[j, ...] = img[(count + j) % n_of_images].reshape(tuple(a2j_dict['A2J_model__A2J_model_ResNetBackBone_Backbone__input_1_swim_transpose_0_fix'].shape[1:]))

        execute_async(dpu_runner, a2j_dict)

        # write results to global predictions buffer
        out_q1.append(a2j_dict['A2J_model__A2J_model_ClassificationModel_classificationModel__Conv2d_output__11586_fix'])
        out_q2.append(a2j_dict['A2J_model__A2J_model_DepthRegressionModel_DepthRegressionModel__Conv2d_output__11920_fix'])
        out_q3.append(a2j_dict['A2J_model__A2J_model_RegressionModel_regressionModel__Conv2d_output__11749_fix'])

        count = count + runSize
    print("Done with the DPU runner")

-----------------------------------------------------------------------------------------------------------------------------------------

def init_dpu_runner(dpu_runner):
    '''
    Setup DPU runner in/out buffers and dictionary
    '''

    io_dict = {}
    inbuffer = []
    outbuffer = []

    # create input buffer, one member for each DPU runner input
    # add inputs to dictionary
    dpu_inputs = dpu_runner.get_input_tensors()
    i=0
    for dpu_input in dpu_inputs:
        #print('DPU runner input:',dpu_input.name,' Shape:',dpu_input.dims)
        inbuffer.append(np.empty(dpu_input.dims, dtype=np.float32, order="C"))
        io_dict[dpu_input.name] = inbuffer[i]
        i += 1

    # create output buffer, one member for each DPU runner output
    # add outputs to dictionary
    dpu_outputs = dpu_runner.get_output_tensors()
    i=0
    for dpu_output in dpu_outputs:
        #print('DPU runner output:',dpu_output.name,' Shape:',dpu_output.dims)
        outbuffer.append(np.empty(dpu_output.dims, dtype=np.float32, order="C"))
        io_dict[dpu_output.name] = outbuffer[i]
        i += 1

    return io_dict, inbuffer, outbuffer

-----------------------------------------------------------------------------------------------------------------------------------------

def execute_async(dpu, tensor_buffers_dict):
    input_tensor_buffers = [tensor_buffers_dict[t.name] for t in dpu.get_input_tensors()]
    output_tensor_buffers = [tensor_buffers_dict[t.name] for t in dpu.get_output_tensors()]
    jid = dpu.execute_async(input_tensor_buffers, output_tensor_buffers)
    return dpu.wait(jid)
-----------------------------------------------------------------------------------------------------------------------------------------

Can anyone help me to get past this stage? I'm stuck here for many days.

Thank you,

Shreyas-NR commented 2 years ago

Issue resolved, my input list was getting overwritten, whenever the append was called.

Thankyou.