chipsalliance / chisel

Chisel: A Modern Hardware Design Language
https://www.chisel-lang.org/
Apache License 2.0
3.91k stars 588 forks source link

"Clone Module" API Idea and Request for Comment #618

Closed jackkoenig closed 5 years ago

jackkoenig commented 7 years ago

As designs grow larger and larger, we are starting to hit fundamental performance problems in both Firrtl and Chisel. There are many techniques to deal with this, and one useful one I think we should consider is allowing users to explicitly "clone" Modules. This would actually partially address both #472 and https://github.com/ucb-bar/firrtl/issues/538.

Cloning would essentially create another Firrtl instance of an already elaborated Module. There are some subtle issues to think about with annotations (Module.component annotations would probably be fine but hierarchical ones would need duplication).

I propose something like:

val inst0 = Module(new MyModule)
val inst1 = inst0.cloneModule
// or
val inst2 = Module.cloneInst(inst0)
// or 
val inst3 = Module.duplicate(inst0)

This does present some serious problems for systems like rocket-chips's LazyModule that utilize their own data structures to emit information about the configuration and the like. However, as a strictly opt-in feature such libraries could better leverage annotations or provide their own APIs that call ours to take advantage.

Thoughts? Questions? Concerns?

chick commented 7 years ago

Do you have some estimate of the gain here? Your example was of an unparameterized Module, is that relevant or just a simpler example. It seems to me that requiring users to do this is not that desirable, it seems a bit hard to explain, users would likely not employ the technique at first and then when they hit the wall would have to go back and fix a bunch of existing code. Could Module.apply could be some soft of factory that (optionally) recognized creating of previous modules and did this under the hood.

aswaterman commented 7 years ago

This will most usefully be employed at a fairly macro level (e.g., an N-core design would be constructed once and cloned N-1 times), where it will achieve a nearly N-fold gain.

jackkoenig commented 7 years ago

The gain is a substantial performance increase for large designs and a guarantee of deduplication (since the module was never duplicated).

An unparameterized Module is a simpler case, it is parameterized Modules where accidentally not dedupable Modules become a bigger problem. This is really only useful for large designs, and I cannot imagine many large designs that are not parameterized.

This is intended to be an advanced feature since it's strictly a performance optimization, and users are not required to use it. Chisel works just fine as is, it's just that large designs--which are almost always hierarchical and instantiate Modules that must be deduplicated--suffer from really bad compile times.

I have thought about Chisel trying to do this automatically. The problem is how can we possibly detect when Modules are the same under the hood? I don't believe it's possible to check that the parameters are purely functional datastructures, let alone trying to guarantee that the Modules don't access global state.

Perhaps we could provide a way for the user to indicate that they would like for us to make that optimization: a mixin like "DontDuplicate" or something for example. This might be a better API, although it does run into the additional problem of how to decide when we can make the optimization. Perhaps checking that the parameters are == is sufficient. Is it possible to inspect the arguments of a class constructor in a super class? Otherwise I'm not sure how to even do this (short of macros)

ducky64 commented 7 years ago

Mostly agree with Chick, it would be nice to keep simple, consistent semantics as long as possible.

Additionally, I think different approaches can help depending on if FIRRTL or Chisel frontend is the bottleneck.

Potential alternative solutions: If FIRRTL is the bottleneck:

If Chisel frontend (hardware construction) is the bottleneck:

aswaterman commented 7 years ago

Alas, the Chisel frontend is already a time & memory capacity bottleneck, so we can't solve this entirely in the IR.

jackkoenig commented 7 years ago

For the record, doing even string-based deduplication in Chisel is something we definitely should do to help make Firrtl faster, but yeah as @aswaterman said, I'm talking about long Chisel runtimes in addition to Firrtl runtimes. Graph / tree isomophism checks for deduplication is only going to be slower, so I think providing a user-API for when they know they want the exact same Module is something we need.

ccelio commented 7 years ago

Can we point to some example code in rocket-chip where one would insert this .cloneModule code? Are these likely to be found in common patterns that we can infer from? I certainly appreciate any improvements we can make to build time.

aswaterman commented 7 years ago

The most important use is for homogeneous multicores. Can't really produce a code example, because the code isn't currently structured to leverage homogeneity; it dumbly produces N things, even if the configurations are the same.

ducky64 commented 7 years ago

Ok, thinking more about macro annotation transforms, I think it could work well and is consistent with how we've been using them (doesn't affect functionality, assuming that you're using it right though):

@pureFunctionalModule  // or some other name, we're bad at names...
class MyGiantSlowRamHogCoreThingy(n: Int, b: Bool) extends Module {
  val io = IO(...)
  val moreIosForFun = IO(...)
  // lots of ugly hardware construction follows
}

would turn into

object MyGiantSlowRamHogCoreThingyDedup {
  // save previously instantiated version of this Module based on parameters
  // restriction: parameters must be hashable
  // restrictions: Module pure functional based on parameters (unchecked)
  val map = HashMap[(Int, Bool), MyGiantSlowRamHogCoreThingy]()
}

class MyGiantSlowRamHogCoreThingy(n: Int, b: Bool) extends Module {
  val io = IO(...)
  val moreIosForFun = IO(...)

  MyGiantSlowRamHogCoreThingyDedup.map.get((n, b)) match {
    case Some(m) => {
      // set this module's implementation to point to m, left as an exercise for the reader, possibly adding some additional deduplication logic / state into chisel3.core.UserModule
      // would also need to be some logic to clean up IO connections
    }
    case None => {
      // lots of ugly hardware construction follows
      MyGiantSlowRamHogCoreThingyDedup.map.put((n, b), this)
    }
  }
}

Note that the macro annotation transform would make it easy to deduplicate any module. This could also be done without macros by moving the deduplicate Module check into a companion object constructor, but that would require a companion object (instead of Module(new ...) syntax) and might be more difficult to use than a Module annotation.

jackkoenig commented 7 years ago

I think a macro annotation is a fine way to do it. 1 possible problem is that moving the constructor inside that anonymous function following None then makes the vals no longer class level. That is obviously not acceptable so is there something else we could do? Perhaps:

class MyGiantSlowRamHogCoreThingy(n: Int, b: Bool) extends Module {

  MyGiantSlowRamHogCoreThingyDedup.map.get((n, b)) match {
    case Some(m) => {
      // set this module's implementation to point to m, left as an exercise for the reader,
      // possibly adding some additional deduplication logic / state into chisel3.core.UserModule
      // would also need to be some logic to clean up IO connections
      return m;
    }
    case None =>
      MyGiantSlowRamHogCoreThingyDedup.map.put((n, b), this)
  }

  val io = IO(...)
  val moreIosForFun = IO(...)
  // lots of ugly hardware construction follows
}
jackkoenig commented 7 years ago

Or perhaps something like

class MyGiantSlowRamHogCoreThingy(n: Int, b: Bool) extends Module {

  private var _io = _
  private var _moreIosForFun = _

  MyGiantSlowRamHogCoreThingyDedup.map.get((n, b)) match {
    case Some(m) => {
      // set this module's implementation to point to m, left as an exercise for the reader,
      // possibly adding some additional deduplication logic / state into chisel3.core.UserModule
      // would also need to be some logic to clean up IO connections
    }
    case None =>
      // lots of ugly hardware construction follows
      _io = IO(...)
      _moreIosForFun = IO(...)
      MyGiantSlowRamHogCoreThingyDedup.map.put((n, b), this)
  }

  val io = _io
  val moreIosForFun = _moreIosForFun
}
ducky64 commented 7 years ago

So the main issue is that each new Module should return a unique instance with unique IO, since the IO is shared with the enclosing scope (where it's instantiated in).

It's unfortunate that generally vals won't be visible anymore (other than IOs, which can be detected statically by looking for the pattern val $name = IO($contents)), but making vals visible would be very difficult (for example, all Data nodes must be unique to each Module instance, otherwise this would badly break a cross-Module printf if we ever got around to that).

Do you have an example where you would want to use a non-IO internal Module val?

jackkoenig commented 7 years ago

Do you have an example where you would want to use a non-IO internal Module val?

Testing, cross module references for specifying how to initialize a memory perhaps.

jackkoenig commented 7 years ago

Also potentially creating annotations to things without having to change the RTL, useful for specifying a simulator or adding place-and-route or other backend specific annotations for RTL you can't modify

ducky64 commented 7 years ago

Now that we're looking at making this more a whitebox clone rather than a blackbox clone, the question is how far does it need to go? Might you need to inspect modules inside the deduplicated module? If so, you'll need to clone (possibly a fast deduplicated clone) those wires and modules (recursively) as well, which will probably eat into performance savings. Additionally, there may be difficult to anticipate interactions coming from essentially partially instantiating a Module. Same for what other kinds of data need to be cloned, and how the user could indicate that the result is pure functional.

jackkoenig commented 7 years ago

Yeah that's a really good point and makes this really complicated šŸ˜ž

Is it possible to turn the annotation on and off? Perhaps it is impossible to compose this feature with type-safe cross-module references so you could have configurations with it turned off. It also might just be time to start profiling Chisel to make this less of a problem.

aswaterman commented 7 years ago

Constant-factor improvements will only help so much - this is a bridge we need to cross at some point, anyway.

aswaterman commented 7 years ago

If Module.duplicate(foo) were to return the new I/Os only, not the module, this wouldn't be a problem. (Yet another argument for keeping the I/Os together.)

ducky64 commented 7 years ago

That's basically the same as having all the internal vals, other than IOs which can be statically detected and checked, inaccessible.

The good thing about the annotation is that it forces all the Module usages to be consistent, in that there's no real distinction (at the user-visible level) between the actual full Module and the deduplicated copies.

One solution might be a way to specify to the annotation which vals, other than IOs, need to be accessible?

jackkoenig commented 7 years ago

That's basically the same as having all the internal vals, other than IOs which can be statically detected and checked, inaccessible.

Not exactly, any instances that you want to have accessible internal vals can be instantiated the normal way

The good thing about the annotation is that it forces all the Module usages to be consistent, in that there's no real distinction (at the user-visible level) between the actual full Module and the deduplicated copies.

I'm not sure that's actually a good thing if it removes functionality. With instances that do have the internal vals you could use cross module references to specify annotations. If they're annotations that apply to all instances of the given Module, then you're good. If you need references for particular instances (eg. initializing mems), I think there are tricks we can play.

sdtwigg commented 7 years ago

I need time to digest the entire proposal but from a skim of the discussion: using macro annotations to completely rewrite the user module is horrifying.

At the very least, would like to focus on what post rewrite form is needed and consider what this alternative api for specifying and using modules looks like.

On May 25, 2017 5:30 PM, "Jack Koenig" notifications@github.com wrote:

That's basically the same as having all the internal vals, other than IOs which can be statically detected and checked, inaccessible.

Not exactly, any instances that you want to have accessible internal vals can be instantiated the the normal way

The good thing about the annotation is that it forces all the Module usages to be consistent, in that there's no real distinction (at the user-visible level) between the actual full Module and the deduplicated copies.

I'm not sure that's actually a good thing if it removes functionality. With instances that do have the internal vals you could use cross module references to specify annotations. If they're annotations that apply to all instances of the given Module, then you're good. If you need references for particular instances (eg. initializing mems), I think there are tricks we can plan.

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ucb-bar/chisel3/issues/618#issuecomment-304159576, or mute the thread https://github.com/notifications/unsubscribe-auth/AA9_-CNR4XvKca9GlOfYB_MO7MNEQkPsks5r9h0QgaJpZM4Nm4J6 .

jchang0 commented 7 years ago

An expansion to this which is related that might use the same underlying code in the future is incremental chisel compile. As the design gets larger, we might need incremental compile capability

I imagine that it might be related since we have to guarantee that the module we generated before is exactly the same as the module we are compiling now to be able to skip the compilation step. Granted that some firrtl optimization can no longer work across boundaries, it might be something that the user have to specify a particular module is subject to incremental compile and known optimization might not be applied.

albert-magyar commented 5 years ago

Resolved by #943