Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
549 stars 464 forks source link

Incorrect results when using clEnqueueMigrateMemObjects with MIG (PL-DDR) on Zynq #5693

Open doonny opened 3 years ago

doonny commented 3 years ago

Hi, I am working on the ZC706 board with MIG (PL-DDR) enabled in a custom vitis platform.

When using clEnqueueMigrateMemObjects to write buffers, the content of the buffers on the device side are not consistent with the memory object pointed by host_ptr. However, when I switched to clEnqueueWriteBuffer, the results are correct.

Followings are the host code that creates and manipulate the buffers:

Codes with clEnqueueMigrateMemObjects API that gives wrong results:

            weights_bank[j].flags = weight_bank | XCL_MEM_TOPOLOGY;
            weights_bank[j].param = 0;
            weights_bank[j].obj   = weight_conv[j];
            bias_bank[j].flags    = weight_bank | XCL_MEM_TOPOLOGY;
            bias_bank[j].param    = 0;
            bias_bank[j].obj      = bias_conv[j];

            // Weights buffers for each layer
            weights_buf[i*LAYER_NUM+j] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR | CL_MEM_EXT_PTR_XILINX,
                                                        weight_buf_size* sizeof(DTYPE), &weights_bank[j], &status);
            checkError(status, "Failed to create buffer for weights in layer");

            // Bias buffers for each layer
            bias_buf[i*LAYER_NUM+j] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR | CL_MEM_EXT_PTR_XILINX, 
                                                     layer_config[j][bias_size] * sizeof(DTYPE), &bias_bank[j], &status);
            checkError(status, "Failed to create buffer for bias in layer");

            // Initializing all weights buffers, blocking write is used
            status = clEnqueueMigrateMemObjects(que_memRd[i], 1, &weights_buf[i*LAYER_NUM+j], 0, /* 0 means from host to device*/
                                                0, NULL, NULL);
            checkError(status, "Failed to transfer weight");

            status = clEnqueueMigrateMemObjects(que_memRd[i], 1, &bias_buf[i*LAYER_NUM+j], 0, /* 0 means from host to device*/
                                                0, NULL, NULL);
            checkError(status, "Failed to transfer bias");

Codes with clEnqueueWriteBuffer that gives correct results:

            // Weights buffers for each layer
            weights_buf[i*LAYER_NUM+j] = clCreateBuffer(context, CL_MEM_READ_ONLY, weight_buf_size* sizeof(DTYPE), NULL, &status);
            checkError(status, "Failed to create buffer for weights in layer");

            // Bias buffers for each layer
            bias_buf[i*LAYER_NUM+j] = clCreateBuffer(context, CL_MEM_READ_ONLY, layer_config[j][bias_size] * sizeof(DTYPE), NULL, &status);
            checkError(status, "Failed to create buffer for bias in layer");

            // Initializing all weights buffers, blocking write is used
            status = clEnqueueWriteBuffer(que_memRd[i], weights_buf[i*LAYER_NUM+j], CL_TRUE, 0, weight_buf_size*sizeof(DTYPE), weight_conv[j], 0, NULL, NULL);
            checkError(status, "Failed to transfer weight");

            status = clEnqueueWriteBuffer(que_memRd[i], bias_buf[i*LAYER_NUM+j], CL_TRUE, 0, layer_config[j][bias_size] * sizeof(DTYPE), bias_conv[j], 0, NULL, NULL);
            checkError(status, "Failed to transfer bias");

During the test, other things remains the same.

To make sure that this is only related to MIG, I have also tested the codes and hw with ZC706 base platform and the U50 base platform, on both platforms, both APIs (MergeObjects and WriteBuffers) give the same correct results.

So my guess is that the XRT might not works correctly with MIG (Zynq 7000 version).

Here are more info about the environment and haredware:

XRT build version: 2.6.0
Build hash: 2d6bfe4ce91051d4e5b499d38fc493586dd4859a
Build date: 2020-08-24 02:43:50
Git branch: 2020.1

Vitis version 2020.1, Platform ZC706, petalinux v2020.1.

And, Is there any way for me to locate the bug more precisely ?

uday610 commented 3 years ago

@chvamshi-xilinx

chvamshi-xilinx commented 2 years ago

We will try to reproduce at our end. 2.6.0 is pretty old release. Can you please switch to the latest release and try once. I am suspecting, this is something related to cache.

doonny commented 2 years ago

@chvamshi-xilinx Is there anyway to update XRT only without changing the customized platform and petalinux image ?

chvamshi-xilinx commented 2 years ago

@doonny , You can build XRT RPMs using Petalinux (Petalinux build -c xrt) and install them after boot using dnf install