Xilinx / Vitis-Tutorials

Vitis In-Depth Tutorials
https://Xilinx.github.io/Vitis-Tutorials/
MIT License
1.2k stars 552 forks source link

Why do I get the following error : "buffer (1) is not resident in device (0) so migration from device to host fails" ? #412

Closed Dalhfire closed 11 months ago

Dalhfire commented 12 months ago

Hi,

I have followed the following tutorial that introduced me to the kernel : https://github.com/Xilinx/Vitis_Accel_Examples/tree/main/cpp_kernels/simple_vadd

I sucessfully make it work on my Vck190 board and now i tried to do someting really close by modifying the host code and the kernel.

Here's the code of HLs top function krnl_dwtfix

krnl_dwtfix.cpp:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <assert.h>

#include </home/ceos-1/Documents/Learn/VitisTutorial/dwt_v0_host/util.hpp>
#include </home/ceos-1/Documents/Learn/VitisTutorial/dwt_v0_host/variables.hpp>
#include "hls_stream.h"
#include "ap_axi_sdata.h"
#include "ap_int.h"
/*--------------------------- DWT 1D INTEGER/FLOAT --------------------------*/
/*                   Version vigule fixe de la DWT float                     */
/*---------------------------------------------------------------------------*/

bool first = true;
int cpt_test =0;

typedef ap_axis<64, 0, 0, 0> axistream_long;

void load_inputs(int64_t *pX, hls::stream<int64_t>& pX_stream) {
mem_rd:
    //Fill pX_stream with the values from pX array
    for (int i = 0; i < 9; i++) {
    //#pragma HLS PIPELINE
    #pragma HLS LOOP_TRIPCOUNT min = 9 max = 9
        pX_stream << pX[i];
    }
}

void store_result(int64_t* out, hls::stream<int64_t>& out_stream) {
mem_wr:
    out[0] = out_stream.read();
}

static void compute_coef(hls::stream<int64_t>& pX_stream, hls::stream<int64_t>& out_stream) {
    int64_t h[5] = {894119, 395736, -115998, -25008, 39666};
    int64_t pX[9];

execute_coef:
    for (int i = 0; i < 9; i++) {
    //#pragma HLS PIPELINE
    #pragma HLS LOOP_TRIPCOUNT min = 9 max = 9
        pX[i] = pX_stream.read();
    }
    int64_t result = 0;

compute_result:
    result += h[0] * pX[4];
    for (int i = 1; i < 5; i++) {
        result += h[i] * (pX[4-i] + pX[4+i]);
    }

    out_stream << result;
}

extern "C" {
/* CalculCoefC
 *  (IN)  h : Low-Pass Coefs de la DWT 
 *  (IN)  pX : vecteur des 2N donnees 
 *  (OUT) pC : vecteur des N donnees Low-Pass de la DWT
 */
    void krnl_dwtfix(int64_t *pX, int64_t *value_out) {
    #pragma HLS INTERFACE m_axi port=pX bundle = gmem0 depth=64
    #pragma HLS INTERFACE m_axi port=value_out bundle = gmem0 depth=64

        static hls::stream<int64_t> pX_stream("pX_stream");
        static hls::stream<int64_t> out_stream("output_stream");

        // Read pX_stream to local arrays
        #pragma HLS DATAFLOW
        load_inputs(pX, pX_stream);
        compute_coef(pX_stream, out_stream);
        store_result(value_out, out_stream);
    }
}

This one seems to work when launching the testbench the results are good.

And then here's the part in the host code where I tried to launch the kernel :


// Fill the h_stream and pX_stream with the appropriate data
    //fillStreams(pX_axistream, pX_extracted);

    // Kernel Part

    // These commands will allocate memory on the Device. The cl::Buffer objects can
    // be used to reference the memory locations on the device.
    auto start = std::chrono::steady_clock::now();
    bool found_device = false;

    // Creates a vector of DATA_SIZE elements with an initial value of 10 and 32
    // using customized allocator for getting buffer alignment to 4k boundary

    std::vector<cl::Device> devices;
    cl_int err;
    cl::Context context;
    cl::CommandQueue q;
    cl::Kernel krnl_dwtfix;
    cl::Program program;
    std::vector<cl::Platform> platforms;

    // traversing all Platforms To find Xilinx Platform and targeted
    // Device in Xilinx Platform
    cl::Platform::get(&platforms);
    for (size_t i = 0; (i < platforms.size()) & (found_device == false); i++) {
        cl::Platform platform = platforms[i];
        std::string platformName = platform.getInfo<CL_PLATFORM_NAME>();
        if (platformName == "Xilinx") {
            devices.clear();
            platform.getDevices(CL_DEVICE_TYPE_ACCELERATOR, &devices);
            if (devices.size()) {
                found_device = true;
                break;
            }
        }
    }
    if (found_device == false) {
        std::cout << "Error: Unable to find Target Device " << std::endl;
        exit(EXIT_FAILURE);
    }

    std::cout << "INFO: Reading " << xclbinFilename << std::endl;
    FILE* fp;
    if ((fp = fopen(xclbinFilename.c_str(), "r")) == nullptr) {
        printf("ERROR: %s xclbin not available please build\n", xclbinFilename.c_str());
        exit(EXIT_FAILURE);
    }
    // Load xclbin
    std::cout << "Loading: '" << xclbinFilename << "'\n";
    std::ifstream bin_file(xclbinFilename, std::ifstream::binary);
    bin_file.seekg(0, bin_file.end);
    unsigned nb = bin_file.tellg();
    bin_file.seekg(0, bin_file.beg);
    char* buf = new char[nb];
    bin_file.read(buf, nb);

    // Creating Program from Binary File
    cl::Program::Binaries bins;
    bins.push_back({buf, nb});
    bool valid_device = false;
    for (unsigned int i = 0; i < devices.size(); i++) {
        auto device = devices[i];
        // Creating Context and Command Queue for selected Device
        OCL_CHECK(err, context = cl::Context(device, nullptr, nullptr, nullptr, &err));
        OCL_CHECK(err, q = cl::CommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &err));
        std::cout << "Trying to program device[" << i << "]: " << device.getInfo<CL_DEVICE_NAME>() << std::endl;
        cl::Program program(context, {device}, bins, nullptr, &err);
        if (err != CL_SUCCESS) {
            std::cout << "Failed to program device[" << i << "] with xclbin file!\n";
        } else {
            std::cout << "Device[" << i << "]: program successful!\n";
            OCL_CHECK(err, krnl_dwtfix = cl::Kernel(program, "krnl_dwtfix", &err));
            valid_device = true;
            break; // we break because we found a valid device
        }
    }
    if (!valid_device) {
        std::cout << "Failed to program any device found, exit!\n";
        exit(EXIT_FAILURE);
    }

    OCL_CHECK(err, cl::Buffer buffer_pX(context, CL_MEM_READ_ONLY, sizeof(int64_t)*9, NULL, &err));
    OCL_CHECK(err, cl::Buffer buffer_v(context, CL_MEM_WRITE_ONLY, sizeof(int64_t), NULL, &err));

    // set the kernel Arguments
    int narg = 0;
    OCL_CHECK(err, err = krnl_dwtfix.setArg(narg++, buffer_pX));
    OCL_CHECK(err, err = krnl_dwtfix.setArg(narg++, buffer_v));

    // We then need to map our OpenCL buffers to get the pointers

    int64_t* ptr_pX;
    int64_t* ptr_v;
    OCL_CHECK(err, ptr_pX = (int64_t*)q.enqueueMapBuffer(buffer_pX, CL_TRUE, CL_MAP_WRITE, 0, sizeof(int64_t)*9, NULL, NULL, &err));
    OCL_CHECK(err, ptr_v = (int64_t*)q.enqueueMapBuffer(buffer_v, CL_TRUE, CL_MAP_READ, 0, sizeof(int64_t), NULL, NULL, &err));    

    int64_t pX_extracted[9];
    extractValues(pX, pX_extracted); 

    // Do not assign new values to ptr_pX and ptr_v
    // Instead, copy the data from pX_extracted to the mapped buffers
    std::memcpy(ptr_pX, pX_extracted, sizeof(int64_t)*9);

    // Data will be migrated to kernel space
    OCL_CHECK(err, err = q.enqueueMigrateMemObjects({buffer_pX}, 0 /* 0 means from host*/));

    // Launch the Kernel
    OCL_CHECK(err, err = q.enqueueTask(krnl_dwtfix));

    // The result of the previous kernel execution will need to be retrieved in
    // order to view the results. This call will transfer the data from FPGA to
    // source_results vector
    OCL_CHECK(err, q.enqueueMigrateMemObjects({buffer_v}, CL_MIGRATE_MEM_OBJECT_HOST));

    OCL_CHECK(err, q.finish());

    /* if (!last_G_ndwt_out_s.empty()){
        last_G_ndwt = last_G_ndwt_out_s.read().data;
        fprintf(stdout,"last_G_ndwt : %d",last_G_ndwt);
    }
    fprintf(stdout,"last_G_ndwt : %d",last_G_ndwt);*/
    v = *ptr_v;

The software and hardware emulation works and compile but when I tried to launch the hw_emulation, I got the following messages in the console and then the program stop :

Loading: 'binary_container_1.xclbin'

XRT build version: 2.15.0
Build hash: 64c933573e7e50a8aba939a74209590c2b739e8b
Build date: 2023-04-17 09:18:13
Git branch: 2023.1
PID: 579
UID: 0
[Thu Oct  5 10:14:25 2023 GMT]
HOST: 
EXE: /mnt/dwt_v0_host
[XRT] ERROR: buffer (1) is not resident in device (0) so migration from device to host fails
terminate called after throwing an instance of 'xrt_xocl::error'
  what():  event 5 never submitted
[   83.198147] zocl-drm amba_pl@0:zyxclmm_drm:  ffff0008003f7810 kds_del_context: Client pid(579) del context Domain(0) CU(0x0)
[   83.208894] zocl-drm amba_pl@0:zyxclmm_drm:  ffff0008003f7810 kds_del_context: Client pid(579) del context Domain(65535) CU(0xffff)
INFO: Reading binary_container_1.xclbin
Loading: 'binary_container_1.xclbin'

Thread 2 "dwt_v0_host" received signal SIGABRT, Aborted.
[Switching to Thread 0xfffff355a120 (LWP 583)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44      pthread_kill.c: No such file or directory.

By using gdb I found that the error comes from the line :

OCL_CHECK(err, q.enqueueMigrateMemObjects({buffer_v}, CL_MIGRATE_MEM_OBJECT_HOST));

But i don't know why it isn't working, does someone has an idea ?

Thanks in advance for your help,

David

randyh62 commented 12 months ago

The store_result() of your kernel doesn't seem to line up with your load_inputs(). Maybe just try to initialize the buffer_v from the kernel to see if you can transfer data.

Also, this is not an issue with the Vitis_Tutorials, and so should be raised on the user forums instead.