chipsalliance / rocket-chip

Rocket Chip Generator
Other
3.27k stars 1.14k forks source link

A moderately dangerous interupt domain crossing bug #3687

Open DecodeTheEncoded opened 2 months ago

DecodeTheEncoded commented 2 months ago

Hello Community, My team is planing to tapout a chipyard based design, and we have a different Clock Scheme from the default chipyard design: domain1, CORE,SBUS,CBUS: 800MHZ** domain2, MBUS,L2Wrapper:400MHZ** domain3: PBUS,FBUS: 100MHZ.**** All these clock domains are rational crossed. They are phase aligned with each other. Therefore, taking UART as an example, interrupt initiated by uart(at PBUS) has to cross to IBUS(the same clock domain with SBUS), see code below:

    (intXType match {
      case _: SynchronousCrossing => where.ibus.fromSync
      case _: RationalCrossing => where.ibus.fromRational
      case _: AsynchronousCrossing => where.ibus.fromAsync
    }) := uart.intXing(intXType)

There are two dangerous bugs in code above: 1,The domain crossing logic is applied twice, the uart.intXing(intXType) will be converted into IntSyncRationalCrossingSink() :*=* IntSyncNameNode(name) :*=* scope { IntSyncNameNode(name) :*=* IntSyncCrossingSource(alreadyRegistered) } :*=* node , whileibus.fromRational also has domain crossing facilities:

/** Collects interrupts from internal and external devices and feeds them into the PLIC */ 
class InterruptBusWrapper(implicit p: Parameters) extends ClockSinkDomain {
  override def shouldBeInlined = true
  val int_bus = LazyModule(new IntXbar)   // Interrupt crossbar
  private val int_in_xing  = this.crossIn(int_bus.intnode)
  private val int_out_xing = this.crossOut(int_bus.intnode)
  def from(name: Option[String])(xing: ClockCrossingType) = int_in_xing(xing) :=* IntNameNode(name)
  def to(name: Option[String])(xing: ClockCrossingType) = IntNameNode(name) :*= int_out_xing(xing)
  def fromAsync: IntInwardNode = from(None)(AsynchronousCrossing(8,3))
  def fromRational: IntInwardNode = from(None)(RationalCrossing(direction = Flexible))
  def fromSync: IntInwardNode = int_bus.intnode
  def toPLIC(xing: ClockCrossingType = NoCrossing): IntOutwardNode = to(Some("toPLIC"))(xing)
}

I don't really think this is the intended behavior, I think domain corssing stuff should only be applied once, that is the IntSource should be at PBUSdomain while the IntSinkshould be at IBUSdomain. 2, The crossing domain collateral being applied twice make the Interrupt Domain crossing moderately dangerous, when applying the first RationalCrossing collateral(coming out of PBUS), the IntSyncAsyncCrossingSink is neither at the domain of PBUSnor the IBUS, it is at where which is the BaseSubsystem, aka the DigitalTop, the ModuleImpof DigitalTop is LazyRawModuleImp, which mean there is no implicit clock and reset for this DigitalTop, therefore anonymous children of DigitalTop will not receive clock or reset, this means the IntSyncAsyncCrossingSinkin wherewill not be approperly clocked. :

  def attachTo(where: Attachable)(implicit p: Parameters): TLUART = where {
    val name = s"uart_${UART.nextId()}"
    val tlbus = where.locateTLBusWrapper(controlWhere)
    val divinit = (tlbus.dtsFrequency.get / device.initBaudRate).toInt
    val uartClockDomainWrapper = LazyModule(new ClockSinkDomain(take = None, name = Some("TLUART")))
    val uart = uartClockDomainWrapper { LazyModule(new TLUART(tlbus.beatBytes, device, divinit)) }
    uart.suggestName(name)

    tlbus.coupleTo(s"device_named_$name") { bus =>

      val blockerOpt = blockerAddr.map { a =>
        val blocker = LazyModule(new TLClockBlocker(BasicBusBlockerParams(a, tlbus.beatBytes, tlbus.beatBytes)))
        tlbus.coupleTo(s"bus_blocker_for_$name") { blocker.controlNode := TLFragmenter(tlbus, Some("UART_Blocker")) := _ }
        blocker
      }

      uartClockDomainWrapper.clockNode := (controlXType match {
        case _: SynchronousCrossing =>
          tlbus.dtsClk.map(_.bind(uart.device))
          tlbus.fixedClockNode
        case _: RationalCrossing =>
          tlbus.clockNode
        case _: AsynchronousCrossing =>
          val uartClockGroup = ClockGroup()
          uartClockGroup := where.allClockGroupsNode
          blockerOpt.map { _.clockNode := uartClockGroup } .getOrElse { uartClockGroup }
      })

      (uart.controlXing(controlXType)
        := TLFragmenter(tlbus, Some("UART"))
        := blockerOpt.map { _.node := bus } .getOrElse { bus })
    }

    (intXType match {
      case _: SynchronousCrossing => where.ibus.fromSync
      case _: RationalCrossing => where.ibus.fromRational
      case _: AsynchronousCrossing => where.ibus.fromAsync
    }) := uart.intXing(intXType)

    uart
  }

Can anyone in the team confirm my understanding is right? If I am right, I wonder what's the typical domain crossing scehmes used by chipyard or rocketchip teams that has been tapout verified. The reason for choosing PBUS at lower clock speed is that the device at PBUS is slow, therefore it's not very power effcient to make it 800MHZ(in our case). I also wonder if the rational crossing type has been used or tested at all? Any help will be appreciated, Thanks again, I am sorry to post issues recently and disrupt you guys. @jerryz123 @sequencer @tianrui-wei

jerryz123 commented 2 months ago

Can you provide a minimal chipyard config that demonstrates the issue? That would help me figure out the problem here.

1) definitely seems like a bug, a bug from the original UART code. There's no reason why the crossing is applied twice.

2) is surprising to me. Connecting a implicit-clocked' module under DigitalTop should result in an elaboration error - "no implicit clock found".

DecodeTheEncoded commented 2 months ago

Jerry, Thanks for your reply, UARTAttachParams is exposed in CDE using case class UARTLocated(loc: HierarchicalLocation) extends Field[Seq[UARTAttachParams]](Nil), but it seems HasPeripheryUART code does not use that parameter:

trait HasPeripheryUART { this: BaseSubsystem =>
  val uarts = p(PeripheryUARTKey).map { ps =>
    UARTAttachParams(ps).attachTo(this)
  }
  val uartNodes = uarts.map(_.ioNode.makeSink())
  val uart = InModuleBody { uartNodes.zipWithIndex.map { case(n,i) => n.makeIO()(ValName(s"uart_$i")) } }
}

Therefore, you can modify the intXType parameter in case class UARTAttachParams :

case class UARTAttachParams(
  device: UARTParams,
  controlWhere: TLBusWrapperLocation = PBUS,
  blockerAddr: Option[BigInt] = None,
  controlXType: ClockCrossingType = NoCrossing,
  intXType: ClockCrossingType = RationalCrossing(direction = Flexible)) extends DeviceAttachParams

We change the inXType from NoCrossing to RationalCrossing(direction = Flexible), and the elaboration passes in the chipyard version we are using, but it fails in the newer version, complaining Error: No implicit clock. as you said.


Chisel exception caught when instantiating intsource within DigitalTop at  (generators/rocket-chip/src/main/scala/interrupts/Crossing.scala:29:31)
Chisel exception caught when instantiating system within ChipTop at  (generators/chipyard/src/main/scala/ChipTop.scala:27:35)
[warn] generators/testchipip/src/main/scala/serdes/TLSerdes.scala 85:9: [W008] Return values of asTypeOf will soon be read-only
[warn]       c <> b.io.protocol
[warn]         ^
[warn] There were 1 warning(s) during hardware elaboration.
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at ... ()
    at chipyard.stage.phases.PreElaboration.$anonfun$transform$1(PreElaboration.scala:35)
    at chisel3.Module$.evaluate(Module.scala:88)
    at chisel3.Module$.do_apply(Module.scala:35)
    at chisel3.stage.phases.Elaborate.$anonfun$transform$2(Elaborate.scala:52)
    at chisel3.internal.Builder$.$anonfun$buildImpl$1(Builder.scala:1042)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
    at chisel3.internal.Builder$.buildImpl(Builder.scala:1032)
    at chisel3.internal.Builder$.$anonfun$build$1(Builder.scala:1023)
    at logger.Logger$.$anonfun$makeScope$4(Logger.scala:148)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
    at logger.Logger$.makeScope(Logger.scala:146)
    at logger.Logger$.makeScope(Logger.scala:133)
    at ... ()
    at ... (Stack trace trimmed to user code only. Rerun with --full-stacktrace to see the full stack trace)
Caused by: chisel3.package$ChiselException: Error: No implicit clock.
    at chisel3.internal.throwException$.apply(Error.scala:237)
    at chisel3.internal.Builder$.$anonfun$forcedClock$1(Builder.scala:849)
    at scala.Option.getOrElse(Option.scala:201)
    at chisel3.internal.Builder$.forcedClock(Builder.scala:849)
    at chisel3.Module.$anonfun$initializeInParent$2(Module.scala:281)
    at scala.Option.getOrElse(Option.scala:201)
    at chisel3.Module.$anonfun$initializeInParent$1(Module.scala:281)
    at chisel3.Data.$anonfun$$colon$eq$1(Data.scala:762)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
    at chisel3.experimental.prefix$.apply(prefix.scala:33)
    at chisel3.Data.$colon$eq(Data.scala:762)
    at chisel3.Module.initializeInParent(Module.scala:281)
    at chisel3.Module$.do_apply(Module.scala:55)
    at org.chipsalliance.diplomacy.lazymodule.LazyModuleImpLike.$anonfun$instantiate$14(LazyModuleImp.scala:71)
    at chisel3.internal.plugin.package$.autoNameRecursively(package.scala:33)
    at org.chipsalliance.diplomacy.lazymodule.LazyModuleImpLike.$anonfun$instantiate$13(LazyModuleImp.scala:70)
    at scala.Option.getOrElse(Option.scala:201)
    at org.chipsalliance.diplomacy.lazymodule.LazyModuleImpLike.$anonfun$instantiate$1(LazyModuleImp.scala:68)
    at scala.collection.immutable.List.flatMap(List.scala:293)
    at org.chipsalliance.diplomacy.lazymodule.LazyModuleImpLike.instantiate(LazyModuleImp.scala:46)
    at org.chipsalliance.diplomacy.lazymodule.LazyModuleImpLike.instantiate$(LazyModuleImp.scala:43)
    at org.chipsalliance.diplomacy.lazymodule.LazyRawModuleImp.instantiate(LazyModuleImp.scala:149)
    at org.chipsalliance.diplomacy.lazymodule.LazyRawModuleImp.$anonfun$x$6$1(LazyModuleImp.scala:168)

My understanding is If PBUS is at different clock domain from the IBUS(currently clocked by SBUS), we have to specify corresponding intXType instead of just using NoCrossing, therefore in our case it's RationalCrossing, can you confirm my understanding is right? If this scenario is not found before, it seems that you normally do not specify the intXType explicitly? Therefore I wonder what your typical bus clock crossing scheme is? Is PBUS typically the same Clock Domain with SBUS&CBUS, or PBUS can be at different clock domain but the intXType should still be NoCrossing? This seems not very robost. Thanks Jerry, Really appreciated your reply. @jerryz123