Xilinx / XRT

Run Time for AIE and FPGA based platforms
https://xilinx.github.io/XRT
Other
558 stars 473 forks source link

trap divide error in in libxrt_coreutil.so.2.8.743 when loading XCLBIN #5511

Closed strocode closed 3 years ago

strocode commented 3 years ago

Hi,

I have a kernel I'm trying to run on a U280

XRT fails to load it with the following dmesg:

[448317.846952] xocl 0000:a4:00.1: p2p.u.10485761 ffff8e8914708410 p2p_bar_map: bank addr ffffffffffffffff, sz 0, slots 1
[448317.846955] xocl 0000:a4:00.1: p2p.u.10485761 ffff8e8914708410 p2p_bar_map: mark 1 - 0 chunks
[448317.846963] xocl 0000:a4:00.1:  ffff8e895767a0a0 xocl_init_mem: drm_mm_init called for the available memory range
[448317.846965] xocl 0000:a4:00.1:  ffff8e895767a0a0 xocl_init_mem: ret 0
[448317.846974] xocl 0000:a4:00.1:  ffff8e895767a0a0 xocl_read_axlf_helper: Loaded xclbin c5e56ff2-d992-4276-90e3-1fe378964cc2
[448317.851986] traps: python[2077] trap divide error ip:7fd8916ad2cb sp:7ffd3ccab770 error:0 in libxrt_coreutil.so.2.8.743[7fd891664000+c7000]
[448318.310748] [drm] client exits pid(2077)
[448318.310753] xocl 0000:a4:00.1:  ffff8e895767a0a0 xocl_drvinst_close: CLOSE 3

2 kernels are in the XCLBIN as folows. I suspect XRT doesn't like partiitoned arrays on AXILITE

void read_pointer_sum1(word_t* pin, int nread, int nzero[N], int csum[N])
{
#pragma HLS INTERFACE ap_memory port=nzero
#pragma HLS INTERFACE ap_memory port=csum

#pragma HLS INTERFACE m_axi port=pin offset=slave bundle=gmem0

    int thesum  = 0;
    for(int i = 0; i < nread; i++) {
        #pragma HLS PIPELINE II=1
        word_t in = pin[i];
        nzero[i] = in.countLeadingZeros();
        int input = in.range(31, 0).to_int();
        thesum += input;
        csum[i] = thesum;
        for(int w = 0; w < NINT; w++) {
            int v = in.range((w+1)*sizeof(int)*8-1, w*sizeof(int)*8).to_int();
        }

    }
}

void read_pointer_sum2(word_t* pin, int nread, int nzero[N], int csum[N])
{
#pragma HLS INTERFACE ap_memory port=nzero
#pragma HLS INTERFACE ap_memory port=csum

#pragma HLS INTERFACE m_axi port=pin offset=slave bundle=gmem0
#pragma HLS ARRAY_PARTITION variable=csum cyclic factor=NINT
    int thesum  = 0;
    for(int i = 0; i < nread; i++) {
        //#pragma HLS PIPELINE II=1
        word_t in = pin[i];
        nzero[i] = in.countLeadingZeros();
        int input = in.range(31, 0).to_int();
        thesum += input;
        for(int w = 0; w < NINT; w++) {
            int v = in.range((w+1)*sizeof(int)*8-1, w*sizeof(int)*8).to_int();
            csum[w + NINT*i] += v;
        }

    }
}

Perhaps XRTis confused because the xclbininfo shows the incorrect signature

Signature: read_pointer_sum2 (void* pin, unsigned int nread, int* nzero, int* csum_0)

see the sum.xclbin in the attached zip file, which can be made with

make sum.xclbin

test_lut.zip

strocode commented 3 years ago

I have a bit more info. The stacktrace of the coredump is below:

(venv) ban115@athena:/data/craco/ban115/CRACO-39-python-luts/python_lut_test$ coredumpctl gdb
           PID: 34273 (python)
           UID: 1000 (ban115)
           GID: 1000 (ban115)
        Signal: 8 (FPE)
     Timestamp: Thu 2021-07-22 10:50:49 AEST (25min ago)
  Command Line: python ./loadxclbin.py sum.v3.xclbin
    Executable: /data/craco/ban115/pynq-example/venv/bin/python3.8
 Control Group: /user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
          Unit: user@1000.service
     User Unit: gnome-terminal-server.service
         Slice: user-1000.slice
     Owner UID: 1000 (ban115)
       Boot ID: 44f16525f63c453c94676254db657a6d
    Machine ID: af4465f6b2c441199e558d1e3d0e1e6a
      Hostname: athena
       Storage: /var/lib/systemd/coredump/core.python.1000.44f16525f63c453c94676254db657a6d.34273.1626915049000000.lz4
       Message: Process 34273 (python) of user 1000 dumped core.

                Stack trace of thread 34273:
                #0  0x00007f2319aca2cb _ZNK8xrt_core6device13get_ert_slotsEPKcm (libxrt_coreutil.so.2)
                #1  0x00007f2319aca534 _ZNK8xrt_core6device13get_ert_slotsEv (libxrt_coreutil.so.2)
                #2  0x00007f2319ded1c2 _ZN8xrt_core9scheduler4initEPvPK4axlf (libxrt_core.so)
                #3  0x00007f2319ddeec5 xclLoadXclBin (libxrt_core.so)
                #4  0x00007f231a022dae ffi_call_unix64 (libffi.so.6)
                #5  0x00007f231a02271f ffi_call (libffi.so.6)
                #6  0x00007f231a236aa4 _ctypes_callproc (_ctypes.cpython-38-x86_64-linux-gnu.so)
                #7  0x00007f231a237224 n/a (_ctypes.cpython-38-x86_64-linux-gnu.so)
                #8  0x00000000005ff94f _PyObject_MakeTpCall (python3.8)
                #9  0x000000000057dcf7 _PyEval_EvalFrameDefault (python3.8)
                #10 0x000000000060251c _PyFunction_Vectorcall (python3.8)
                #11 0x000000000057d54b _PyEval_EvalFrameDefault (python3.8)
                #12 0x000000000060251c _PyFunction_Vectorcall (python3.8)
                #13 0x0000000000578799 _PyEval_EvalFrameDefault (python3.8)
                #14 0x00000000005760ed _PyEval_EvalCodeWithName (python3.8)
                #15 0x000000000066299e n/a (python3.8)
                #16 0x0000000000662a77 PyRun_FileExFlags (python3.8)
                #17 0x000000000066378f PyRun_SimpleFileExFlags (python3.8)
                #18 0x0000000000687dce Py_RunMain (python3.8)
                #19 0x0000000000688159 Py_BytesMain (python3.8)
                #20 0x00007f231c165bf7 __libc_start_main (libc.so.6)
                #21 0x00000000006073fa _start (python3.8)
uday610 commented 3 years ago

Hello @strocode

XRT and VITIS flow do not support ap_memory interface. Only supported interfaces are s_axilite and m_axi when interacting with host. Please review this section for supported interface, you can see a table:

https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#ariaid-title3

So I suggest you to change the interface to make it work with Vitis/XRT flow.

Thanks

strocode commented 3 years ago

Hi @uday610,

So as far as you know is there any way of having a programmable lookup table of values in VITIS that doesn't use a whole m_axi interface?

On Tue, 27 Jul 2021 at 10:23, uday610 @.***> wrote:

Hello @strocode https://github.com/strocode

XRT and VITIS flow does not support ap_memory interface. Only supported interfaces are s_axilite and m_axi. Please review this section for supported interface, you can see a table:

https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/managing_interface_synthesis.html#ariaid-title3

So I suggest you to change the interface to make it work with Vitis/XRT flow.

Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Xilinx/XRT/issues/5511#issuecomment-887117204, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOS22NKA5LYEULCCUFLLS3TZX37HANCNFSM5AXOMC7Q .

-- Keith Bannister

uday610 commented 3 years ago

As you see in the table, there are only m_axi, s_axilite interfaces are possible for XDMA based recent platforms. So you can read those scalar arrays from s_axilite interface.. But generally people read them using m_axi interface utilizing burst transfer. For smaller constant table you can try PLRAM as that is faster (instead of DDR or HBM).

stsoe commented 3 years ago

@strocode Thanks for the report. We will trap the divide error, which occurs because Error: Invalid kernel offset in xclbin for kernel (read_pointer_sum2) argument (nzero). The offset (0x8000) and size (0x8000) exceeds kernel address range (0x4096): Invalid argument

I looked the kernel code. As @uday610 points out we don't support ap_memory, and secondly declaring scalars of size 8K will never work as kernel arguments via XRT managed kernels. I am not sure if you were planning on using xrt::kernel, but the max supported programmable register space is 4K for managed kernels. If you need > 4K address space, then xrt::ip can be used, but setting the values for 8K scalars will be cumbersome at best.

strocode commented 3 years ago

Thanks @uday610 and @stsoe - I've been away and just getting back to this now. Yeah, we're trying to do something a bit weird. We might need to try another route. I haven't looked at xrt:ip yet.