chipsalliance / firrtl-spec

The specification for the FIRRTL language
45 stars 28 forks source link

Cleanup String-encoded Literals #89

Closed seldridge closed 1 year ago

seldridge commented 1 year ago

The FIRRTL spec currently allows for string-encoded integer literals, e.g., "hDEADBEEF", to be used in almost any location where an integer is accepted. Because this has been how the SFC has worked since its inception, this is going to be enshrined in the spec here: https://github.com/chipsalliance/firrtl-spec/pull/86

The current rule is (summarizing):

A string-encoded integer literal can be used any place an integer literal can be except when defining memory depths and memory read/write latencies. Additionally, string-encoded integer literals are treated as strings in external module and intrinsic module parameters (as there is no way to differentiate them from normal strings).

This allows for circuits like the following to be accepted by the SFC and MFC:

circuit Foo:
  module Foo:
    input clock: Clock
    input a: UInt<"hF">["o10"]
    output b: UInt<"b1111">[8]

    b <= a

However, this was never the intent of the original spec. (This actually stems from an incorrect implementation of the spec in the original FIRRTL grammar.) Additionally, there is concern that having multiple ways to represent numbers is not useful for a language like FIRRTL. There are several options that we can take here:

  1. Leave everything as it.
  2. Expand the usage of string-encoded integer literals to be used by memories. This would bring the spec in alignment with itself while leaving external module parameters as a corner case. This isn't terrible, but it continues to allow for interesting circuits like the one above.
  3. Restrict string-encoded integer literals to only being used in the construction of constant integer expressions, e.g., UInt("hBADF00D"). This would then disallow their usage in widths, sub-indices, etc. This brings the spec inline with its original intent.
  4. Remove all string-encoded integer literals. This would force UInt("hBADF00D") to be represented as UInt(195948557). This is less user-friendly, but is fine for a compiler. This also creates better round-trip behavior for compilers. I.e., it would be good if FIRRTL compilers could have one internal representation for constant integer expressions. However, if a compiler does this and it ingests "hBADF00D", can it then emit 195948557.

I'm of the opinion that we should just take option (4) and be done with it. I think the overhead of having more than one number representation isn't particularly useful. As a slight weakening of this, I would be okay if the numbers for constant expressions were only allowed to be hex. I do think it is a great thing if a compiler doesn't have to track additional state in order to round trip IR.

Other related options to consider:


A very random note on converting numbers. The command line RPN calculator dc is fantastic at doing number conversions. For the above conversion, you can do echo "16i BADF00Dp" | dc to do the conversion.

seldridge commented 1 year ago

The current SFC behavior for round-tripping is that constant integer expressions (UInt and SInt literals) are always emitted as string-encoded hex. Everything else is emitted as decimal. The MFC behavior is to emit everything as decimal.

mwachs5 commented 1 year ago

The current encoding of a string could be changed to something that isn't a string. E.g., UInt("hBADF00D") could be either UInt(hBADF00D) or UInt(0hBADF00D).

I like this one the most, especially with the leading 0.

seldridge commented 1 year ago

Fixed via #91.