THU-DSP-LAB / ventus-gpgpu

GPGPU processor supporting RISCV-V extension, developed with Chisel HDL
Mulan Permissive Software License, Version 2
633 stars 73 forks source link

support FPGA? #21

Open MWHYNOT opened 1 year ago

MWHYNOT commented 1 year ago

when I try the way in the /ventus/fpga_test/read.me ,I find that 1)The folder does not contain the imports mentioned in read.me. 2)After I run the command: make verilog, I get the file whose name is GPGPU_top.v instead of GPGPU_axi_top.v as mentioned in read.me 3)When I run the command: source project_gpgpu.tcl in vivado, I get the error: WARNING: [IP_Flow 19-2248] Failed to load user IP repository 'c:/Users/25305/Desktop/ip_repo/ CTA_Schedular_1.0'; Can't find the specified path. I found that the CTA_Schedular_1.0 mentioned in the .tcl file really doesn't show up And, I would like to ask if you can write the FPGA tutorials in more detail. Thanks in advance,and this is really a good project

MWHYNOT commented 1 year ago

@yangkex @yff18 sir,could you give me some help.thanks

yangkex commented 1 year ago

Due to driver‘s requirements we have updated our Chisel code. However, the modifications related to the FPGA are still targeted at the old version, so it may not be possible to run the existing tests. We will update everything once we finished tests. BTW now make verilog will make you get GPGPU_top (refer to src/top/ExtMem_gen.scala), you may use GPGPU_axi_top instead, and then we manually wrap our code with an additional layer to compatible with the AXI interface required by FPGA project.

MWHYNOT commented 1 year ago

@yangkex Thanks for your reply,sir,I would like to ask more: 1、What does driver refer to? 2、Do you mean to directly replace the parameter GPGPU_top with GPGPU_axi_top in src/top/ExtMem_gen.scala, when I do this, the program reports an error: top.GPGPU_axi_top does not take parameters So what you mean may be to manually change the name of the generated file(GPGPU_top.v) to GPGPU_axi_top.v ? and then manually wrap code with an additional layer to compatible with the AXI interface required by FPGA project ? Thans for you reply again!

yangkex commented 1 year ago

@yangkex Thanks for your reply,sir,I would like to ask more: 1、What does driver refer to? 2、Do you mean to directly replace the parameter GPGPU_top with GPGPU_axi_top in src/top/ExtMem_gen.scala, when I do this, the program reports an error: top.GPGPU_axi_top does not take parameters So what you mean may be to manually change the name of the folder(GPGPU_top.v) to GPGPU_axi_top.v ? and then manually wrap code with an additional layer to compatible with the AXI interface required by FPGA project ? Thans for you reply again!

  1. The driver (or KMD, kernel mode driver) is the interface between hardware (chisel code) and software (pocl and LLVM), that is our driver repo (ventus-driver). The driver will send necessary control signals and data to launch the hardware.
  2. Sorry I didn't say it clearly. Just like this section of code that we commented out, object GPGPU_gen extends App{(new chisel3.stage.ChiselStage).emitVerilog(new GPGPU_axi_top()). Here specific parameters depend on the class declaration, and class GPGPU_axi_top doesn't take any parameter.
  3. I mean to use GPGPU_axi_top.v directly. There may be bugs in current FPGA scripts. The updated FPGA tutorial is coming soon, which will include how to manually wrap code.
MWHYNOT commented 1 year ago

@yangkex Thanks for your reply,sir,I would like to ask more: 1、What does driver refer to? 2、Do you mean to directly replace the parameter GPGPU_top with GPGPU_axi_top in src/top/ExtMem_gen.scala, when I do this, the program reports an error: top.GPGPU_axi_top does not take parameters So what you mean may be to manually change the name of the folder(GPGPU_top.v) to GPGPU_axi_top.v ? and then manually wrap code with an additional layer to compatible with the AXI interface required by FPGA project ? Thans for you reply again!

1. The driver (or KMD, kernel mode driver) is the interface between hardware (chisel code) and software (pocl and LLVM), that is our driver repo (ventus-driver).  The driver will send necessary control signals and data to launch the hardware.

2. Sorry I didn't say it clearly. Just like this section of code that we commented out, `object GPGPU_gen extends App{(new chisel3.stage.ChiselStage).emitVerilog(new GPGPU_axi_top())`. Here specific parameters depend on the class declaration, and `class GPGPU_axi_top` doesn't take any parameter.

3. I mean to use `GPGPU_axi_top.v` directly. There may be bugs in current FPGA scripts. The updated FPGA tutorial is coming soon, which will include how to manually wrap code.

Thank you very much for your reply. Currently I have run synthesis and implementation of VENTUS and deploy it on the FPGA, but I found that the macro definitions in the provided file ventus-gpgpu/ventus/fpga_test/scrs/driver/naive_driver.h , e.g., GPU_WG_ID_OFFSET 0x04, but in the file GPGPU_axi_top.v generated by chisel, I found that the offset of register corresponding to wg_id :regs_1 is 4'h1 like this : if (reset) begin regs_1 <= 32'h0; end else if (write) begin if (4'h1 == addr[3:0]) begin regs_1 <= dataOut; end end So is there something wrong?Thanks again for your reply!

yangkex commented 1 year ago

These registers are 32-bit wide so their address are 4-Byte aligned. So GPU_WG_ID_OFFSET is set as 0x04.
The last 2 bits of address from AXI interface should be cut.

yangkex commented 1 year ago

BTW current codes about FPGA only fit the version we released in Aug, 2022. I'm not sure current chisel codes are compatible with them.

MWHYNOT commented 1 year ago

BTW current codes about FPGA only fit the version we released in Aug, 2022. I'm not sure current chisel codes are compatible with them.

Actually, I found the version released in August 2022, and I integrated the GPGPU into the ZYNQ7000 and ran naive_driver.c directly by the SDK after changing the macro definitions of the variables GPU_BASEADDR and GPU_HIGHADDR , but always getting Waiting time limit exceeded. which means that Gpu_ReadReg( GpuInstance->BaseAddr, GPU_WG_VALID_OFFSET) is always zero. However I always thought it was the macro definition of the offset address that was wrong Can you give me some advice, thank you very much for your response!!!!

yangkex commented 1 year ago

When you run naive_driver, the data GPU program needs will be sent to DDR, and the driver will configure GPU control registers such as GPU_WG_ID through AXI. Then GPU hardware run programs, put result into DDR and write GPU_WG_VALID. That's why the driver always accesses the value of this register. Through the process I mentioned, I advise to check whether bus address of FPGA PL and DDR correctly. Also You can use ILA to capture signals of AXI to make sure each read and write request is correct.

MWHYNOT commented 1 year ago

When you run naive_driver, the data GPU program needs will be sent to DDR, and the driver will configure GPU control registers such as GPU_WG_ID through AXI. Then GPU hardware run programs, put result into DDR and write GPU_WG_VALID. That's why the driver always accesses the value of this register. Through the process I mentioned, I advise to check whether bus address of FPGA PL and DDR correctly. Also You can use ILA to capture signals of AXI to make sure each read and write request is correct.

Thank you very much for your reply. So Is it right that if I just set GPU_BASEADDR to fit my hardware architecture in naive_driver.h , then I can directly run naive_driver.c with the default definition of TestTask and ProgramInstr ProgramData in the naive_driver.c , as shown below: TaskConfig TestTask = {0,0,0,0,0,0,0,0,0,TestMem}; static u32 ProgramInstr[1024] = {0}; static u32 ProgramData[32768] = {0}; In other words, I see that you say in the manual that some signals will be provided by the compiler, but when it comes to the FPGA, these signals need to be configured by myself, so how can someone determine how to configure them? Anyhow, Thank you very much for your response!!!!

MWHYNOT commented 1 year ago

well, I have done the following tests 1, GPU_BASEADDR is set the same as block design in vivado 2, OFFSET address is kept as naive_driver.h because that I'm using the original August '22 version without change. 3, through ila capture AXI found that indeed write in, but read the signal has been zero, means that at last GPU_WG_VALID_OFFSET is also zero, and does not become 1 So I'm wondering if it's a problem with some parameter configuration in naive_driver like following: TaskConfig TestTask = {0,0,0,0,0,0,0,0,0,TestMem}; static u32 ProgramInstr[256] = {0}; static u32 ProgramData[256] = {0}; Looking forward to your reply, thanks!