esl-epfl / cv32e40px

CV32E40P is an in-order 4-stage RISC-V RV32IMFCXpulp CPU based on RI5CY from PULP-Platform
Other
2 stars 1 forks source link

Combinational loops in synthesis (cv32e40px) #7

Open gonzo-sal opened 3 weeks ago

gonzo-sal commented 3 weeks ago

Combinational loops in synthesis (cv32e40px)

I have designed a co-processor wrapper based on the CV-X-IF standard (v0.2.0). The idea is to use it as a quick template to implement custom instructions easily. Simulation works, but synthesis still gives combinational loops and the bitstream is not generated for the cv32e40px core. Simulation and synthesis works for cv32e40x core.

IMP: latest commit with x illegal fix is already applied.

Testing and findings: My issue_ready signal responds in a combinational manner based on some signals from the CPU, as requested by the cv-x-if standard (v0.2.0). One of these signals is rs_valid: if any of the registers I need is still not valid according to rs_valid then issue_ready goes low, waiting for the register to be valid. This behaviour is shown below on a custom accumulation example: second instruction has to wait the previous one to writeback the result in the X registers before continuing with the accumulation (accumulation register is in the CPU).

image

After several tests I have encountered If I remove the rs_valid dependency from my issue_ready signal, then synthesis works. However I miss and important functionality: either I take the risks of executing an instruction with wrong/old data by not checking the rs_valid signal or I check rs-valid and simply reject the instruction due to unavailable data if necessary (instead of putting the issue_ready signal low), but then an illegal instruction happens and program breaks.

Note: find below a code snippet from the fpu_ss repo. Driving issue_ready according to rs_valid seems to be the correct behaviour (apart from the interpretation of the cv-x-if standard).

  // Issue Interface
  // ---------------
  always_comb begin
    x_issue_ready_o = 1'b0;
    if (((prd_rsp_use_rs_i[0] & x_issue_req_rs_valid_i[0]) | !prd_rsp_use_rs_i[0])
      & ((prd_rsp_use_rs_i[1] & x_issue_req_rs_valid_i[1]) | !prd_rsp_use_rs_i[1])
      & ((prd_rsp_use_rs_i[2] & x_issue_req_rs_valid_i[2]) | !prd_rsp_use_rs_i[2])
      & in_buf_push_ready_i) begin
      x_issue_ready_o = 1'b1;
    end
  end

Component

Component:RTL: For issues in the RTL (e.g. for files in the rtl directory)

Steps to Reproduce

Please provide:

  1. git hash: 15b9dd6077513342cf44e6853a5fc33098f2e73b
  2. Command line: make vivado-fpga FPGA_BOARD=pynq-z2
  3. Logfile and/or wave-dump info (screen shots can be useful): check above
davideschiavone commented 2 weeks ago

hello @gonzo-sal - before jumping to debugging the core, can you give us more information about the coprocessor interface? The current coprocessors we tried do not show such comb loops (see the example of fpu_ss inside the X-HEEP repository)

What are the differences between your coprocessor and the fpu_ss ?

thx

gonzo-sal commented 2 weeks ago

Hello @davideschiavone,

I did some research on the fpu_ss implementation on X-HEEP before submitting the issue request. Find below the data I have so far:

In conclusion there is no direct (rapid) way on my side to test the synthesis of the fpu_ss through the cv-x-if directly with X-HEEP, as there is not automatic instantiation for synthesis via parameters. Have you tested this functionality internally?

Thanks in advance.