Artoriuz / maestro

A 5 stage-pipeline RV32I implementation in VHDL
MIT License
19 stars 5 forks source link

Flushing unit output spike causes No-Op to be 3 clock cycles instead of 2 #3

Open aaronelsonp opened 3 years ago

aaronelsonp commented 3 years ago

Hello,

I tried to run my Fibonacci instructions in Simulation Waveform Editor, but I had difficulties getting the expected output. Here is my assembly instruction.

    addi x6,x0,1
up:
    add x7,x5,x6
    sw x6,0x0(x0)
    lw x5,0x0(x0)
    sw x7,0x0(x0)
    lw x6,0x0(x0)
    jal up

which disassembles to:

    4:      00100313        00000000000100000000001100010011

00000008 <up>:
    8:      006283b3        00000000011000101000001110110011
    c:      00602023        00000000011000000010000000100011
    10:     00002283        00000000000000000010001010000011
    14:     00702023        00000000011100000010000000100011
    18:     00002303        00000000000000000010001100000011
    1c:     fedff0ef        11111110110111111111000011101111

I modified the files to make it work in ModelSim, so I don't have to do 7-minute compilations every time I change the program. For some reason, "use work.all" didn't work in ModelSim, so I had to declare the components in every .vhd file.

After inspecting it in ModelSim and comparing it to the Ripes simulator, it turns out that there's an excess No-OP instruction. I think this is because the pipeline register is still reset when the third rising_edge is happening. There's a weird spike at the beginning of the third clock. I solved it in ModelSim by adding a delay.

After adding delay: image

However, when I do a functional simulation or a timing simulation in Quartus, the problem persists. image image image

I haven't tried it on an actual FPGA. Probably it will fix itself, but I'm not sure because the timing simulation output is incorrect. image

aaronelsonp commented 3 years ago

I fixed this problem by following an example from StackOverflow.

image image

Essentially, by delaying the output by one clock, I removed the spike mentioned before. The flushing unit will now assign the flushing output of the next rising edge during the current falling edge. Because of the delay, instead of using signals from the ex/mem register, I used the OR-ed jump/branch signal from the previous stage. The flushing output will be available at the same clock as previously, but now without the spike.

Here is my modified flushing unit:

library IEEE;
use IEEE.STD_LOGIC_1164.all;
use work.all;

entity flushing_unit is
    port (
        clear, clock : in std_logic;
        flushing_control : in std_logic;
        flushing_output : out std_logic
    );
end flushing_unit;

architecture behavioural of flushing_unit is
signal internal_flushing_control : std_logic := '0';
begin

process(clock)
begin
    if falling_edge(clock) then
       if flushing_control = '1' then
          internal_flushing_control <= '1';
       else
          internal_flushing_control <= '0';
       end if;
    end if;
end process;

flushing_output <= clock and internal_flushing_control;

end architecture behavioural;

in datapath:

FLUSH : flushing_unit port map(reset, clock, ALU_branch_response or jump_flag_ID_EX, flushing_unit_output);

I also modified your Progmem interface a bit. In your original Datapath, instead of using the output of the PC, you were using the PC_next signal. Probably you were trying to use the Quartus ROM primitive, that's positive-edge-triggered by default. So I converted it into negative-edge-triggered by adding an inverter in the Progmem interface.

in progmem.vhd: progmem_0 : progmem port map(memory_address(15 downto 0), not clock, output_data); in datapath: progmem_module_0 : progmem_interface port map(clock, PC_output, progmem_output);

Artoriuz commented 2 years ago

Hi @aaronelsonp,

First I'd like to say that I'm very happy my little toy project appears to have been useful =D

It's been a while and I don't remember much, but I think the "final" version of the core had issues with the very first instruction not executing correctly due to issues with PC signals not propagating in time. It's very possible that similar problems were also present in other parts of the core, but I don't remember seeing anything like what you're describing.

In any case, since I haven't touched the project since 2019, I think trying to figure out whether the issue is in the RTL itself or in the tools (different FPGA targets, different versions of Quartus or Modelsim, etc) is kinda pointless. I'm just glad you were able to fix your problem and made it work correctly.

As the idea has always been providing something easy to approach and hack, feel free to do any modifications you see fit. I'm sure there are many ways to improve the design and make it even better =)