ferrandi / PandA-bambu

PandA-bambu public repository
GNU General Public License v3.0
224 stars 44 forks source link

Testbench generation with inferred interface #337

Closed sheldonz7 closed 1 week ago

sheldonz7 commented 1 week ago

Hi, I am trying to generate a testbench for my design from .xml file.

This is the command I used:

bambu ../atax.c --top-fname=atax --pretty-print=out.c --print-dot --compiler=I386_CLANG13 -O2 --debug 4 --verbosity 4 > stdout.txt 2> stderr.txt --device=xcu55c-2Lfsvh2892-VVD --disable-function-proxy --channels-number=8 --generate-tb=../../test_atax.xml --simulate --simulator=VERILATOR --generate-interface=INFER

However, I got this error:

Connecting DUT control ports...
    Connecting testbench modules...
      Module SystemMEM
        bambu_testbench_impl/SystemMEM/clock <-> bambu_testbench_impl/clock
        bambu_testbench_impl/SystemMEM/reset <-> bambu_testbench_impl/SystemFSM/reset
        bambu_testbench_impl/SystemMEM/done_port <-> bambu_testbench_impl/DUT/done_port
        Memory port Mout_oe_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port Mout_we_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port Mout_addr_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port Mout_data_ram_size not present in DUT module bambu_testbench_impl/DUT
        Memory port Mout_Wdata_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port Sout_DataRdy not present in DUT module bambu_testbench_impl/DUT
        Memory port Sout_Rdata_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port M_DataRdy not present in DUT module bambu_testbench_impl/DUT
        Memory port M_Rdata_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port Mout_back_pressure not present in DUT module bambu_testbench_impl/DUT
        Memory port S_oe_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port S_we_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port S_addr_ram not present in DUT module bambu_testbench_impl/DUT
        Memory port S_data_ram_size not present in DUT module bambu_testbench_impl/DUT
        Memory port S_Wdata_ram not present in DUT module bambu_testbench_impl/DUT
      Module if_array_A_fu
        bambu_testbench_impl/if_array_A_fu/clock <-> bambu_testbench_impl/clock
        bambu_testbench_impl/if_array_A_fu/setup_port <-> bambu_testbench_impl/SystemFSM/setup_port
        bambu_testbench_impl/if_array_A_fu/A_address0 <-> bambu_testbench_impl/DUT/A_address0
        bambu_testbench_impl/if_array_A_fu/A_ce0 <-> bambu_testbench_impl/DUT/A_ce0
error -> This point should never be reached - Port A_address1 not found in DUT module bambu_testbench_impl/DUT
    virtual DesignFlowStep_Status TestbenchGeneration::Exec()
    ../../../../src/HLS/simulation/testbench_generation.cpp:603

The kernel code i used:

#include "atax.h"

void atax(DATA_TYPE A[N][N], DATA_TYPE x[N], DATA_TYPE y_out[N])
{
#pragma HLS interface mode=ap_fifo port=y_out
    int i, j;
    DATA_TYPE buff_A[N][N];
    DATA_TYPE buff_x[N];
    DATA_TYPE buff_y_out[N];
    DATA_TYPE tmp1[N];

    lprd_1:
    for (i = 0; i < N; i++) {
        buff_x[i] = x[i];
        buff_y_out[i] = 0;
        tmp1[i] = 0;
        lprd_2:
        //#pragma unroll(2) 
        for (j = 0; j < N; j++) {
            buff_A[i][j] = A[i][j];
        }
    }

    lp1: 
    #pragma unroll(4)
    for (i = 0; i < N; i++) {
        lp2: 
        #pragma unroll(4) 
        for (j = 0; j < N; j++) {
            tmp1[i] = tmp1[i] + buff_A[i][j] * buff_x[j];
        }
    }

    lp3:
    #pragma unroll(4)
    for (i = 0; i < N; i++) {
        lp4:
        #pragma unroll(4)
        for (j = 0; j < N; j++) {
            buff_y_out[j] = buff_y_out[j] + buff_A[i][j] * tmp1[i];
        }
    }

    lpwr_1:
    for (i = 0; i < N; i++) {
        y_out[i] = buff_y_out[i];
    }
}

The xml file I'm using (named to csv for github upload): test_atax.csv

It seems to me that Bambu is still trying to generate testbench for the minimal interface rather than inferred interface. Please help, thanks!

Ansaya commented 1 week ago

Hi, the issue you see is due to the --channels-number=8 option, which is not an allowed value in your configuration. The error is there since the testbench generation does support arbitrary channel count for FIFO interfaces, while the interface generator only supports single and dual-channel FIFO interfaces. A single-channel FIFO interface is generated when the --channels-number value differs from 2. (This is due to the fact that the option to set individual channel count for each FIFO interface is currently missing)

Furthermore, it is OK to set the --channels-number to more than two only if the --memory-allocation-policy=NO_BRAM or --memory-allocation-policy=EXT_PIPELINED_BRAM option is used since the former sets the number of channels for the shared memory bus, which is also shared with internal BRAMs which do not support more than two channels.

Considering your specific case, if you want to achieve a configuration with three FIFO interfaces, each with eight memory channels, the tool does not currently support this. The closer you can get is using --memory-allocation-policy=EXT_PIPELINED_BRAM and --channels-number=8, which will generate an accelerator without any internal memory but with eight shared memory channels.

sheldonz7 commented 1 week ago

Hi Michele, Thank you so much for the prompt response! The reason I'm enabling more than two memory channels is to have the same effect as array_partition in Vitis HLS, which gives the arrays more parallel memory accesses by storing them into more bram modules. While in Vitis HLS array_partition pragma is applied to each array separately, I think y_out in this case doesn't need to have more than 2 channels.

This is the kind of Vitis/Vivado HLS setting I'm trying to reproduce in Bambu:

set_directive_resource -core RAM_1P "atax" A
set_directive_array_partition -type cyclic -factor 8 -dim 2 "atax" A
set_directive_resource -core RAM_1P "atax" x
set_directive_interface -mode ap_fifo "atax" y_out
set_directive_array_partition -type cyclic -factor 8 -dim 2 "atax" buff_A
set_directive_array_partition -type cyclic -factor 8 -dim 1 "atax" tmp1
set_directive_array_partition -type cyclic -factor 8 -dim 1 "atax" buff_x
set_directive_array_partition -type cyclic -factor 8 -dim 1 "atax" buff_y_out
set_directive_pipeline "atax/lprd_2"
set_directive_unroll -factor 8 "atax/lprd_2"
set_directive_pipeline "atax/lpwr_1"
set_directive_unroll -factor 8 "atax/lpwr_1"
set_directive_pipeline "atax/lp2"
set_directive_unroll -factor 4 "atax/lp2"
set_directive_pipeline "atax/lp4"
set_directive_unroll -factor 1 "atax/lp4"

Is there a way to do array partitioning for I/Os and local variables in Bambu? I also found pipelining in this case to be tricky and Bambu also throws some errors when enabling pipelining for the atax function.

sheldonz7 commented 1 week ago

Hi, Since I want internal BRAM, I tried the same design with --channels-number=1, which generates single-channel fifo as well as BRAM, but I got a different error from before:

Full command:

bambu ../atax.c --top-fname=atax --pretty-print=out.c --print-dot --compiler=I386_CLANG13 -O2 --debug 4 --verbosity 4 > stdout.txt 2> stderr.txt --device=xcu55c-2Lfsvh2892-VVD --disable-function-proxy --generate-interface=INFER --channels-number=1 --generate-tb=../../test_atax.xml --simulate --simulator=VERILATOR
Linking libmdpi_driver.so
Linking testbench
/usr/bin/ld: HLS_output/simulation/build/obj/out.c.pp.o: in function `__m_pp_atax':
out.c:(.text+0x42): undefined reference to `x_bambu_artificial_ParmMgr_Read'
/usr/bin/ld: out.c:(.text+0x7b): undefined reference to `A_bambu_artificial_ParmMgr_Read'
/usr/bin/ld: out.c:(.text+0x94): undefined reference to `A_bambu_artificial_ParmMgr_Read'
/usr/bin/ld: out.c:(.text+0xad): undefined reference to `A_bambu_artificial_ParmMgr_Read'
/usr/bin/ld: out.c:(.text+0xc6): undefined reference to `A_bambu_artificial_ParmMgr_Read'
/usr/bin/ld: out.c:(.text+0x82f): undefined reference to `y_out_bambu_artificial_ParmMgr_Write'
/usr/bin/ld: out.c:(.text+0x858): undefined reference to `y_out_bambu_artificial_ParmMgr_Write'
/usr/bin/ld: out.c:(.text+0x881): undefined reference to `y_out_bambu_artificial_ParmMgr_Write'
/usr/bin/ld: out.c:(.text+0x8aa): undefined reference to `y_out_bambu_artificial_ParmMgr_Write'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [/opt/panda/share/panda/libmdpi/Makefile.mk:193: HLS_output/simulation/testbench] Error 1
make: *** Waiting for unfinished jobs....
Please report bugs to <panda-info@polimi.it>
fabrizioferrandi commented 1 week ago

Hi, --pretty-print=out.c cannot be used with Interface Infer and --simulate. You do not need to pass --channels-number=1 as well. Remove the above options, and it should work.

sheldonz7 commented 1 week ago

Hi, I can confirm it works. Thanks!