chipsalliance / chisel

Chisel: A Modern Hardware Design Language
https://www.chisel-lang.org/
Apache License 2.0
3.9k stars 586 forks source link

Absorb rocket.DecodeLogic into Standard Library #1187

Open seldridge opened 4 years ago

seldridge commented 4 years ago

tl;dr: Proposal to migrate Rocket Chip Decode Quine--McKluskey algorithm into Chisel3 standard library.

Type of issue: feature request

Impact: API addition (no impact on existing code)

Development Phase: proposal

Other information

What is the current behavior?

Users wanting to build a microprocessor need to build a decoder. However, because Chisel and FIRRTL have no notion of actual "don't cares" and/or the ability to emit casex, this cannot be pushed to the synthesis tool to properly optimize.

Rocket solves this with a Quine-McKluskey implementation inside freechips.rocketchip.rocket.DecodeLogic. This is great, but it requires either code replication or pulling in Rocket Chip for a feature which is, arguably, something that is common to hardware designs and not Rocket designs.

What is the expected behavior?

The standard library (or some other library, e.g,. chisel3.optimization) should expose this API and not Rocket.

What is the use case for changing the behavior?

Moving this and adding documentation would enable users to do QM optimizations for building decoders without having to pull in Rocket.

Any API modifications as a result of the migration would preserve backwards compatibility with Rocket Chip usage. (I don't know yet if these would be needed or warranted. Likely not.)

Feedback Requested

I'd like feedback from @aswaterman on this.

Notes

Espresso implementations:

aswaterman commented 4 years ago

I'm actually trying to kill this utility and replace it with something more performant. Its performance tips over, sometimes in surprising ways (e.g. adding one more row to the decoder changes its runtime from <1 sec to >1 min). Other exact solutions to this NP-hard problem might have smoother performance curves, and will have better constant factors, as this implementation isn't great.

Unfortunately, I haven't had much time to investigate/implement alternatives.

seldridge commented 4 years ago

I forgot to mention that the other alternative here (or a future addition) would be an interface to Espresso. Alternatively, that could be the initial implementation of this and the original QM is not ported.

aswaterman commented 4 years ago

I wasn't able to find a Java implementation of Espresso, but that's probably a better way to go. In addition to its heuristic method, it can also provide exact solutions, like QM does.

seldridge commented 4 years ago

My thought was JNI interface to a C program version of espresso (or just calling the espresso binary directly from Scala and requiring users to build it). @ekiwi suggested to look into JNA or passing strings to ABC.

It's really obnoxious that there's no Java/Scala implementation so portability is lost. Rolling our own Scala implementation is also a possibility. I haven't dug into this and I have no knowledge of how espresso works, so I can't comment on implementation complexity. The original Berkeley C code is 14k lines (just lines, not source lines), so this would be a likely big effort.

I wasn't aware that it had both heuristic and exact solutions... That seems like it's better than QM in every way, then.

chick commented 4 years ago

JNI to C is done Rupan/espresso

seldridge commented 4 years ago

Good find, @chick. That is GPL v3 and I'm not sure if it's published. There may be a way to use that.

jackkoenig commented 4 years ago

I believe it'd be fine to link against that, but then (not that we do this but we have floated the idea) we wouldn't be able to ship a fat jar without being infected by GPL. I'm more worried about us accidentally doing that than I am about the restriction.

sequencer commented 4 years ago

Will #1198 also provide a solution for this issue that leave these optimisation to synthesis tool?

seldridge commented 4 years ago

@sequencer: I didn't think about there being inter-relatedness with #1198, but there is. I think initially a solution to #1187 would not emit case, but a solution to #1198 would enable it.

jackkoenig commented 4 years ago

An alternative "backend" for a decode logic generator could be to leave the decoder logic as a blackbox and emit a data file that could be used to drive espresso or Verilog generation. Might be a bit of a sledgehammer on a nail, but I don't think #1198 solves the issue unless we were to add support for unknowns to FIRRTL

aswaterman commented 4 years ago

Since this sort of optimization causes semantic changes on don’t-care inputs, using different backend implementations can result in different behavior. It’s a workable proposal, but it adds risk.

sequencer commented 4 years ago

Will commercial synthesis tools support espresso-like mux optimisation with switch case semantics? I think we might need a qor benchmark framework with hammer.