SpinalHDL / VexRiscv

A FPGA friendly 32 bit RISC-V CPU implementation
MIT License
2.41k stars 408 forks source link

Wrong speculative execution when conditional branch argument is in TCM address range #412

Closed vianney closed 2 months ago

vianney commented 3 months ago

Hello,

Many thanks for the wonderful projects (both SpinalHDL and VexRiscv). I think I have found a nasty bug in the implementation of Tightly Coupled Memory.

Summary

When a conditional branch instruction is directly followed by a store instruction writing to TCM, and the operand of the branch is within the TCM address range, then the TCM will be unconditionally written to. Other types of instructions following such branches might also be affected (not investigated).

Minimal Working Example

tcmbug.scala

import spinal.core._
import vexriscv._
import vexriscv.ip._
import vexriscv.plugin._

class TCMBug extends Component {
  val cpu = new VexRiscv(VexRiscvConfig(Seq(
    new IBusCachedPlugin(
      resetVector = 0x10000000l,
      config = InstructionCacheConfig(
        cacheSize = 4096,
        bytePerLine = 8,
        wayCount = 1,
        addressWidth = 32,
        cpuDataWidth = 32,
        memDataWidth = 32,
        catchIllegalAccess = true,
        catchAccessFault = true,
        asyncTagMemory = false,
      ),
    ),
    new DBusCachedPlugin(
      config = DataCacheConfig(
        cacheSize = 4096,
        bytePerLine = 8,
        wayCount = 1,
        addressWidth = 32,
        cpuDataWidth = 32,
        memDataWidth = 32,
        catchAccessError = true,
        catchIllegal = true,
        catchUnaligned = true,
      ),
    ),
    new IBusDBusCachedTightlyCoupledRam((0x10000000l, 1 KiB),
                                        ramAsBlackbox = false,
                                        hexInit = "tcmbug.hex",
                                        ramOffset = 0x10000000l),
    new StaticMemoryTranslatorPlugin(_ => False),
    new DecoderSimplePlugin,
    new RegFilePlugin(plugin.ASYNC),
    new SrcPlugin,
    new IntAluPlugin,
    new HazardSimplePlugin(),
    new BranchPlugin(earlyBranch = false),
    new CsrPlugin(CsrPluginConfig.smallest(0)),
  )))

  for (plugin <- cpu.plugins) plugin match {
    case plugin : IBusCachedPlugin =>
      plugin.iBus.cmd.ready := True
      plugin.iBus.rsp.valid := True
      plugin.iBus.rsp.data := 0
      plugin.iBus.rsp.error := True
    case plugin : DBusCachedPlugin =>
      plugin.dBus.cmd.ready := True
      plugin.dBus.rsp.valid := True
      plugin.dBus.rsp.data := 0
      plugin.dBus.rsp.error := True
    case plugin : CsrPlugin =>
      plugin.timerInterrupt := False
      plugin.externalInterrupt := False
    case _ =>
  }
}

object TCMBug extends App {
  import spinal.core.sim._
  SimConfig.withFstWave.compile(new TCMBug).doSim{ dut =>
    dut.clockDomain.forkStimulus(10)
    sleep(10000)
  }
}

tcmbug.hex

:020000041000EA
:100000009300A00237010010639400002320110820
:10001000B700001063940000232011086F00000057
:0400000510000000E7
:00000001FF

Compiled from the following two files with:

riscv32-none-elf-gcc -march=rv32i -mabi=ilp32 -specs=nosys.specs -nostartfiles -T tcmbug.ld tcmbug.S -o tcmbug.elf
riscv32-none-elf-objcopy -O ihex tcmbug.elf tcmbug.hex

tcmbug.S

    .section .text
    li x1, 42           # 10000000
    li x2, 0x10000000   # 10000004
    bnez x1, 1f         # 10000008 - always branching
    sw x1, 128(x2)      # 1000000c - not executed, ok
1:
    li x1, 0x10000000   # 10000010
    bnez x1, 2f         # 10000014 - always branching, but SRC_ADD is in TCM
    sw x1, 128(x2)      # 10000018 - should not be executed !!!
2:
    j 2b                # 1000001c

tcmbug.ld

OUTPUT_ARCH("riscv")
MEMORY { TCM : ORIGIN = 0x10000000, LENGTH = 1024 }
SECTIONS {
    .text : { *(.text) } >TCM
}

Simulation Result

Simulation Result

Full waveform

As you can see above, the first store (instruction 1000000c) is properly halted and not executed. The second store (instruction 10000018) however gets written in the TCM, even though the instruction should also be skipped.

From what I understand reading through the code, the following happens:

  1. SrcPlugin.scala, line 77: SRC_ADD is set to the operand of the branch instruction (x1 in our case) or the destination address of a store.
  2. DBusCachedPlugin.scala, line 439: MEMORY_TIGHTLY is set in the execute stage whenever SRC_ADD is in the range of the TCM, regardless of the actual instruction being executed.
  3. DBusCachedPlugin.scala, line 478: HAS_SIDE_EFFECT is overridden to low in the memory stage whenever MEMORY_TIGHTLY is set. Why is this done regardless of the instruction being executed?
  4. DBusCachedPlugin.scala, line 442: Because HAS_SIDE_EFFECT is low in the memory stage (processing the branch), the execute stage (processing the store) is not halted.
Dolu1990 commented 2 months ago

Hi ^^

Thanks for the issue.

It is now fixed with the commit above.