Minres / CoreDSL

Xtext project to parse CoreDSL files
Apache License 2.0
16 stars 3 forks source link

Type of constant expressions #29

Closed AtomCrafty closed 2 years ago

AtomCrafty commented 2 years ago

The way constant expressions are currently described in the specification results in some unintuitive behavior. We recently introduced this rule:

[C] literals have the unsigned type with the minimal width required to represent the value.

This, in combination with the type rules for the shift operator, results in the fact that the expression 1 << 4 is evaluated to zero, because 1 is of type unsigned<1> and shift preserves the type of the left operand.

My suggestion would be to scrap the rule mentioned above and instead introduce a more generalized rule for the evaluation of constant expressions:

Compile-time constant integer expressions are evaluated with arbitrary precision and carry the smallest type capable of representing their value.

With one caveat:

An expression consisting solely of a single Verilog-style literal carries the type prescribed by the literal prefix.

This new rule also circumvents the need for negative literals, as -4 now has type signed<3> instead of signed<4>.

One open question would be how to handle binary, octal and hexadecimal C literals.

Expression Value Type Before Type After
0 0 unsigned<0> unsigned<0>
1 1 unsigned<1> unsigned<1>
-4 -4 signed<4> signed<3> ⚠️
14 + 1 15 unsigned<5> unsigned<4> ⚠️
1 << 4 16 unsigned<1> unsigned<5> ⚠️
4'sb1011 -5 signed<4> signed<4>
6'sd42 -22 signed<6> signed<6>
4'sb0001 1 signed<4> signed<4>
4'sb0001 + 1 2 signed<5> unsigned<2> ⚠️
5'd42 - Error Error
jopperm commented 2 years ago

I think in itself, this proposal makes sense, but I fear it may just introduce a different set of special rules etc. Where would we draw the line of the constant evaluation, e.g. how would we handle 1 + 2 + x +3, 1+(2+3)+4?

I agree that 1 << 4 being zero is a terrible wart caused by the current rules. However, 4'sb0001 + 1 being narrower and no longer signed is also quite confusing. Granted, 1<<4 is probably the more common expression...

AtomCrafty commented 2 years ago

Could you come up with a scenario where 4'sb0001 + 1 evaluating to a smaller type than signed<4> would be a problem? When it is assigned to a variable or passed to a function, it would automatically be extended to the target type anyway.

The only location where the actual width of a constant would matter (and to be frank the only situation where I could imagine the verilog literals being used) is within a ConcatenationExpression (or instruction encoding, which is essentially the same). And I don't see people using complex expressions in there anyway. We could however emit a warning when a constant expession consisting of something other than a single verilog literal is used within a concatenation.

jopperm commented 2 years ago

4'sb0001 + 1 being 2'd2 is probably fine.

I think what feels wrong to me is that the semantic of individual operators depends on all operands being literals (or not). Circling back to 1 << 4, I think it's easier to consistently teach users (docs, warnings) that CoreDSL is quirky here and such an expression always shifts all bits out of the results, compared to doing "the right thing" for literals, but not for unsigned<3> x = 4; ... = 1 << x;.

I do like the general idea of not blowing up types of constants unnecessarily, but as I said earlier, I suspect it's hard to draw a line for where to stop the constant evaluation (and propagation?).

AtomCrafty commented 2 years ago

I get your point. What about if we instead calculate all expressions in the same manner, but allow assignments where the assigned value is a constant that fits into the assignment target, even if the static type of the expression is larger?

Regardless, I would still introduce a special case for the uniry minus operator if applied to an integer constant. That way we can essentially simulate signed literals with the proper type.

jopperm commented 2 years ago

I get your point. What about if we instead calculate all expressions in the same manner, but allow assignments where the assigned value is a constant that fits into the assignment target, even if the static type of the expression is larger?

Interesting, and in line with the other casting rules: A narrowing cast, but with the guarantee that no precision is lost. I don't see this as pressing for the 2.0 MVP, but as this change will relax the casting rules, we can add that ability to the language and the frontend at a later time.

Regardless, I would still introduce a special case for the uniry minus operator if applied to an integer constant. That way we can essentially simulate signed literals with the proper type.

Tempting, but it's the same reasoning as above: special handling the unary minus operator would introduce an asymmetry between it and the subtraction: -4 would be s3, whereas 0-4 would be u1 - u3 = s4.

Decision: we keep our a-bit-too-conservative-but-consistent type system for 2.0.