SpinalHDL / VexRiscv

A FPGA friendly 32 bit RISC-V CPU implementation
MIT License
2.52k stars 420 forks source link

Simulation crashed near roundFront / discardCount using single-precision FPU; made a workaround #237

Open wjl opened 2 years ago

wjl commented 2 years ago

We've been evaluating using VexRiscv for a project, and it's mostly been going beautifully. Even though nobody on our team knows Scala super well, VexRiscv and SpinalHDL so far is a hit because of how well it integrates into our existing VHDL flow, and how easy it is to configure VexRiscv to do what we want.

However we have run into one issue. We've worked around it, but I wanted to report what we've seen and what we did to work around it, in case this is something that should be fixed in the VexRiscv codebase.

To summarize:

The first symptom we saw was a crash in the simulator saying that there was an out-of-bounds index. (For reference, this was using the latest Xilinx Vivado built-in simulator.) The crash in question pointed to this generated code:

roundFront_roundAdjusted <= pkg_cat(pkg_toStdLogicVector(pkg_extract(pkg_cat(pkg_toStdLogicVector(pkg_toStdLogic(true)),std_logic_vector(pkg_shiftRight(roundFront_manAggregate,1))),to_integer(roundFront_discardCount))),pkg_toStdLogicVector(pkg_toStdLogic((roundFront_manAggregate and roundFront_exactMask) /= pkg_unsigned("0000000000000000000000000"))));

The actual error was that bit 25 was being indexed from a vector with bounds 24 downto 0. Our first workaround was just to allow this to happen by modifying pkg_extract to have the following guard:

function pkg_extract (that : std_logic_vector; bitId : integer) return std_logic is
  begin
    -- Added this guard if statement.
    if bitId > that'high then
      return '0';
    end if; 
  -- Original code just did this.
  return that(bitId);
end pkg_extract;

But that was just a hack. It worked fine after this fix, but of course modifying generated code is terrible, and we wanted to get to the root cause anyway in case it was going to cause misbehavior, etc.

So we tracked this code to the scala code in src/main/scala/vexriscv/ip/fpu/FpuCore.scala here: https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/ip/fpu/FpuCore.scala

  val roundFront = new Area {
    val input = merge.arbitrated.stage()
    val output = input.swapPayload(new RoundFront())
    output.payload.assignSomeByName(input.payload)

    val manAggregate = input.value.mantissa @@ input.scrap
    val expBase = muxDouble[UInt](input.format)(exponentF64Subnormal + 1)(exponentF32Subnormal + 1)
    val expDif = expBase -^ input.value.exponent
    val expSubnormal = !expDif.msb
    var discardCount = (expSubnormal ? expDif.resize(log2Up(p.internalMantissaSize) bits) | U(0))
    if (p.withDouble) when(input.format === FpuFormat.FLOAT) {
      discardCount \= discardCount + 29
    }
    val exactMask = (List(True) ++ (0 until p.internalMantissaSize + 1).map(_ < discardCount)).asBits.asUInt
    val roundAdjusted = (True ## (manAggregate >> 1)) (discardCount) ## ((manAggregate & exactMask) =/= 0)

None of us are Scala or SpinalHDL experts, but tracing through this code it was clear that, at least when using a single-precision FPU, it's definitely possible for discardCount -- which is indexing a vector in the roundAdjusted assignment -- to go out of bounds. We fixed this in our version by adding code that says:

// HACK: make discardCount never trigger an out-of-bounds lookup
// HACK: ONLY WORKS FOR SINGLE PRECISION
// HACK: this might be covering up some other function error ... but it works when we do this
when (discardCount > 24) {
  discardCount = 0
}

Anyway, our fix is probably not "correct", but with this workaround everything has been working for us so far. Without this fix, the simulator crashes with an error saying we are indexing bit 25 out of a (24 downto 0) vector. Our GUESS is that this only happens with a single-precision FPU enabled.

Thank you. We're loving VexRiscv and hope this report helps make it even better. =)

Dolu1990 commented 2 years ago

Hi ^^

Thanks for the issue :D So i see two possibilities :

I didn't tried to deploye the FPU in VHDL, only verilog so far ^^

So, maybe one thing which would be great to figure out the nature of the issue, would be to manualy put some VHDL report in that pkg_extract to print the time at which the issue occure, and then, check in the simulation if the roundFront.input.valid is set at that time. If it is, then it is a design bug, else we may just want to change SpinalHDL itself to have the guard as you added it manualy.

Would that be possible for you to test as mentioned above ?

wjl commented 2 years ago

Thanks for the quick reply!

I will work with the guys doing the simulation and see if we can track down more information as you described.

As some additional info, we weren't running any floating point instructions when this first happened. So our thought was that it was just a symptom of garbage running through the FPU pipeline and resulting in an invalid indexing operation. If this is the case, it would be harmless for synthesis and real operation, because the unit isn't actually activated, but of course VHDL is very strict and compliant simulators will get angry (halting the simulation, not just generating 'U') if anything is indexed out of bounds. I wonder if this is a general issue with Spinal HDL VHDL generation, and isn't seen in Verilog because it's more forgiving? Just a thought, but if that were so, generating a safer pkg_extract that checks bounds like in my first example would be a possible simple fix, and doesn't seem like it would affect the synthesis results in the common case.

Anyway, we'll look into this more and I'll report back if we can recreate the error and get more info.

wjl commented 2 years ago

Here is more info, although it kind of sounds like this might already have been addressed in the referenced commit!

We undid our workaround(s) and ran our simulation with an eye to capturing more information. Below is a screencap of the waveform output in the Xilinx Vivado simulator, showing all the internal signals from the roundFront area:

image

The error message was as follows:

ERROR: Index 25 out of bound 24 downto 0
Time: 34973063 ps  Iteration: 4  Process:
…/axi_core_cpu/FpuPlugin_fpu/line__8542
  File: …/VexRISCv/VexRiscv/Core.vhd
HDL Line: …/VexRISCv/VexRiscv/Core.vhd:460
run: Time (s): cpu = 00:00:14 ; elapsed = 00:00:34 . Memory (MB): peak = 4451.727 ; gain = 23.328

The referenced line of code in Core.vhd:460 is in pkg_extract:

package body pkg_scala2hdl is
  function pkg_extract (that : std_logic_vector; bitId : integer) return std_logic is
  begin
    return that(bitId); -- this is line Core.vhd:460
  end pkg_extract;

The calling line 8542 referenced in the error message is:

roundFront_roundAdjusted <= pkg_cat(pkg_toStdLogicVector(pkg_extract(pkg_cat(pkg_toStdLogicVector(pkg_toStdLogic(true)),std_logic_vector(pkg_shiftRight(roundFront_manAggregate,1))),to_integer(roundFront_discardCount))),pkg_toStdLogicVector(pkg_toStdLogic((roundFront_manAggregate and roundFront_exactMask) /= pkg_unsigned("0000000000000000000000000")))); -- this is line Core.vhd:8542

Most of this is the same as in the original report, but hopefully now what's going on is a lot more clear.

Basically what appears to be going on is as follows:

Some more thoughts about this:

I hope this helps!

Dolu1990 commented 2 years ago

Thanks for the report :)

and this may not happen (or just not as often?) with a double-precision

So, i would say, with VHDL the issue in double-precision is likely to come too.

SpinalHDL is allowing the code to between where a 5 bit signal (values 0 to 31) is being used to index a 25 bit signal (24 downto 0). If this is semantically valid, then I think the FPU is fine

Yes it is ok from a SpinalHDL perspective. Maybe to avoid any issue of that sort, we could have some of automatic padding from SpinalHDL. Maybe 'X' padding. That's would make things explicit.

Based on the reference to a commit over in SpinalHDL, it sounds like maybe you already fixed this issue in SpinalHDL VHDL generation. But I wanted to write up all this data we collected anyway, just to close the loop.

Cool thanks :D

Let's me know if any additional issue popup ^^