chipsalliance / firrtl

Flexible Intermediate Representation for RTL
https://www.chisel-lang.org/firrtl/
Apache License 2.0
730 stars 177 forks source link

RFC: Restructuring InferResets to be an optimization. #1376

Open albert-magyar opened 4 years ago

albert-magyar commented 4 years ago

Many of the most powerful things done in the FIRRTL compiler are optimizations. I believe that this is due to two strong features guaranteed by the definition of an "optimization" pass:

  1. A strong as-if correctness model that allows for behavior refinements. In FIRRTL, this as-if model is currently assumed to be "of the whole circuit unless modified by dontTouch".
  2. The requirement that the unoptimized base case be a functionally correct default implementation for all inputs
  3. Some general metric or notion of "improving" an implementation

Benefits and drawbacks

Implementing InferResets as an optimization would allow it to be repeatedly run in the compiler and make it easy to move in the transform order, since it would have no obligation to exhaustively classify the resets in the input. Additionally, it would simplify its implementation, since a valid implementation could decide to "punt" on tricky cases until they are simplified with ConstantPropagation, ExpandWhens, or other passes.

However, converting InferResets to an optimization would create a new requirement: that there exist a default implementation that can be passed through all stages of the compiler to generate functional code with (potentially) worse output quality metrics.

In order to fulfill this requirement, there are two main implementation patterns. In each case, any notion of "subtyping" is taken under the model that a Connect can be modeled as an assignment and that a PartialConnect, Connect in a Conditionally or Mux can be modeled as a conditional assignment. Any signal is assumed to be be a supertype of the least upper bound of all of the types of the signals it is (conditionally) connected to. At any point in compilation, all connections in the circuit must be legal.

Case 1: Retaining abstract reset

In the most direct porting of the current code, making InferResets an optimization would involve allowing Reset to remain in the design while inferrable resets are selectively optimized to UInt<1> or AsyncReset. However, this would require defining behavior for Reset that allows it to act as a safe default in all cases. Given that it must support all behaviors, the natural behavior for such a "safe default" would mirror that of AsyncReset, causing asynchronously reset registers to be inferred conservatively.

Under this scheme, the following code would produce an asynchronously-reset register without running InferReset:

input clock: Clock
input sr: UInt<1>
wire r: Reset
reg r: UInt<1>, clock with: (reset => (r, UInt<1>(0)))

Under this scheme, it is a design decision whether it is possible to assign a value of type UInt<1> to a signal of type AsyncReset.

Case 2: UInt<1> <: AsyncReset

With the assumption of asynchronous reset as the "safe default", this case has the exact same expressability as Case 1. Given the low-level nature of an IR, the fact that this provides less encapsulation of how the reset types are implemented is not as significant of a downside as in other contexts. With respect to implementing InferResets as an optimization, the two schemes are identical, so the decision comes down to the relative explainability of the concept that UInt<1> subtypes AsyncReset

Under this scheme, the following code would be legal without any additional typecasts:

input sr: UInt<1>
output ar: AsyncReset
ar <= sr

Multiple phases of optimization

The following could all be easily supported by InferResets-as-optimization.

Casting AsyncReset to UInt<1>

To make the reset semantics more robust, the only way to explicitly convert an AsyncReset (or Reset in Case 2) expression to have type UInt<1> under these cases will be with a cast, which may take the form of a new overloading of the asUInt operator or a new operator. This cast will require both a UInt<1> and a Clock as arguments. The meaning of this cast will be "interpret this boolean signal as already being synchronous with the specified clock domain." It will not insert any synchronization logic.

This association would allow InferResets to have a strong foundation for ideally optimizing the inference of synchronous resets. While this seems to add a new entanglement of clocks and resets to FIRRTL, the presence of a UInt<1> as a reset on a register already is taken to guarantee that it is synchronous with the register's clock.

Converting UInt<1> to AsyncReset

The goal of this proposal is not to define the exact rules for type promotions and conversions among these types.

From the perspective of InferResets being written as an optimization, there is no meaningful distinction between the case where "upcasts" are required and the case where they are not, as long as the optimization is allowed to optimize across upcasts.

The standard "barrier" for the proposed optimization form of InferResets that prevents it from optimizing abstract or async resets is connection to a top-level source of type AsyncReset. Furthermore, this can be augmented with analysis of synchronization domains and sink register domains to prevent propagating a "synchronous reset" across a clock domain crossing -- a design flaw that can be slipped past the existing InferResets.

To provide a further barrier that explicitly guarantees that asynchronous resets remain asynchronous in the absence of a visible source of asynchrony, the designer may also dontTouch the asynchronous reset signal.

Finally, in the event that it is decided to allow assigning a UInt<1> value to a signal of type AsyncReset, we must add a new operator that provides an explicit barrier to optimization.

input clock: Clock
input sr: UInt<1>
wire ar1: AsyncReset
wire ar2: AsyncReset
wire ar3: AsyncReset
ar1 <= asAsyncReset(sr)               ; allows optimization downstream
ar2 <= asAsyncReset(unsync(sr))       ; ar2 will never be inferred to be synchronous
ar3 <= asAsyncReset(sync(ar2, clock)) ; but ar3 can when used on regs in 'clock'

The addition of such a barrier brings back the ability to "force" asynchronous resets, so it is largely a stylistic choice whether to rely on a separate reset type or barriers in situations where async reset optimization is forbidden. Since FIRRTL is a low-level IR, the abstraction provided by an "always async reset" type is not as meaningful as in a human-written language.

albert-magyar commented 4 years ago

@jackkoenig @seldridge @azidar some more random reset thoughts. Sorry for the TLDR.

Maybe @aswaterman could chime in and let me know if I'm crazy and that it's not legal in all cases to apply a synchronous reset to a circuit that is emitted in a manner matching how we currently handle asynchronously-asserted resets.

seldridge commented 4 years ago

This is an interesting way of thinking about it... From a dependency API perspective, this morphs the meaning of invalidation from "I undo the work of transform foo" to "I enable foo to do more work". I don't think there's any incompatibility here, just that it's something different.

There's also an argument that it already works this way. E.g., transforms which "invalidate" constant propagation aren't really invalidating anything---they just enable new work for constant propagation to pick up.

In theory, things like constant propagation could be structured this way. I.e., instead of there being separate constant propagation transforms that operate at different times, they could all be merged into one transform that will pick up more optimizations if more things are available. This may make them harder to write.

As an aside: recursive refinement might be a tempting pattern, but this was never included in the dependency API (a transform which invalidates itself cannot be scheduled). This also violates idempotency which is baked into assumptions about how the dependency API wants users to reason about transforms (requiring a transform implies that running it once satisfies a prerequisite). However, such recursive refinement could always be allowed for some internal structure if it's an elegant way to express the internals of a transform.

mwachs5 commented 4 years ago

I am not sure I understand the implications because bluntly I don't understand why this is not working at the moment (I am unable to determine why the resets in my design are being inferred to UInt<1>... they get that after the current InferResets pass which has no way to debug what it's doing.

But I would say, with this proposal there seems to be NO WAY for a design to say that it must be synchronously reset. If you just allow a UInt<1> to be connected to AsyncReset (and does that mean that it gets turned into an AsyncReset after that assignment, so the flops beneath are async flops??), then how can a module ever ensure that its flops ARE sync reset, when that was the designer's intent? Or am I misunderstanding the meaning of this example:

input sr: UInt<1>
output ar: AsyncReset
ar <= sr

What will flops driven by ar be generated as?

How about the opposite:

input ar: AsyncReset
output sr: UInt<1>
sr <= ar

What will flops driven by sr be generated as?

How will this proposal affect the above code?

albert-magyar commented 4 years ago

Under any flavor of this proposal, the way for a module to guarantee that it is always fully synchronously reset is for it to declare the input it uses as a reset to have type UInt<1>. This is the same as the current implementation.

Here, r will always be generated as a synchronously reset register, and there is no way any optimization may violate that.

  input clock: Clock
  input reset: UInt<1>
  reg r: UInt<1>, Clock with: (reset => (reset, myinit))

For your first code example, and flops driven by sr will be synchronously reset. As to what happens to the flops driven by ar, more context is needed, since ar is an output and cannot drive any flops in this context.

Your second code example is currently illegal and will remain illegal.

I've added a lot more detail to the original post that may clarify. Basically, the fundamental difference between this proposal and the status quo is that resets are implemented as asynchronous unless there is reason to believe they may be synchronous. This frees the pass from the burden of having to exhaustively classify all resets.

Currently, InferResets works under the assumption that being driven by a UInt implies that a Reset is synchronous. It is possible to create reset-type bugs under this model, since it is possible to pass this UInt reset into modules in different clock domains. I've tried to address some of that with the discussion of the different flavors of reset conversions, but it is also possible to remove those and make it work in a manner very similar to the status quo.

albert-magyar commented 4 years ago

This is an interesting way of thinking about it... From a dependency API perspective, this morphs the meaning of invalidation from "I undo the work of transform foo" to "I enable foo to do more work". I don't think there's any incompatibility here, just that it's something different.

Yeah, that is the general assumption. I imagine that pattern will pop up reasonably often with Dedup and ConstantPropagation.