Avoiding unintended truncation

nano-o commented 3 years ago

I recently introduced a bug in the blst proof that was not easy to track down. The problem was that I misplaced a back tick, causing something that I wanted to be computed at the type level to be computed instead at the value level with an insufficient bit width. For example, below, g is 0b00 while h is 0b10. I wanted h but, because I misplaced the back tick, I got g (this is a simplified version of what really happened).

I am wondering whether Cryptol could emit a useful warning here. Maybe Cryptol could see that the expression involves only constants and, in this case, emit a warning that there is an overflow.

type t = 3

f : [2] -> [2]
f x = x

g = f (`t*2/3)
h = f `(t*2/3)

msaaltink commented 3 years ago

A mistake like this is easy to make in part because backtick is not described in much detail in the documentation. Programming in Cryptol mentions it briefly but leaves (for me) the impression that it would apply only to type variables, not type expressions, so it is not obvious that the definition of h is possible. Backtick is not in the index either, so it is a bit hard to look up the rules.

brianhuffman commented 3 years ago

It's tricky to even say whether or not a given constant expression has an "overflow", since Cryptol doesn't have separate signed and unsigned types, and there is no specified INTMIN or INTMAX. For example, on type [8], it seems like 250+250 would be considered to "overflow", but what about (-6) + (-6)? They are the same values, after all. What about 100+100 at type [8], which technically has a "negative" result?

robdockins commented 3 years ago

The backtick is indeed pretty strange. Maybe we need a section in the book that is just about the various syntactic sugar and how they are interpreted that would let us get into some more of these issues. There are some odd corner cases with the enumeration forms as well. See also #808

LeventErkok commented 3 years ago

I remember this being a constant source of confusion and indeed causing subtle bugs in Cryptol-1 days as well. I think the fundamental issue here is that the conversion syntax is just too lightweight. I think this is a point where the language can require an explicit ascription to be given. Something like:

  t2v([8], size-expression-here)

where the user explicitly says what the type of the resulting type should be. (Of course, that could be a quantified type-variable if one is available in the context.)

You can swap the order of arguments above, or come up with a different syntax; but the point being the user is required to state exactly how wide the resulting bit-vector should be. You'll still have the overflow issues of course, but at least the user explicitly states how wide a bit-vector he's intending to use.

msaaltink commented 3 years ago

A similar possibility would be to have backtick produce a value of type Integer. Then any conversion to a bitvector must be explicit.

LeventErkok commented 3 years ago

@msaaltink Good point! (Back in Cryptol-1, we didn't have Integer as a type. But for Cryptol-2, that'd be the right choice to go through with an explicit conversion by the user to the resulting type, which might very well still be Integer, or a bit-vector, or even a float if you choose to do so.)

brianhuffman commented 3 years ago

Restricting backtick to work only on type Integer wouldn't solve the problem in the original post. The backtick notation `t is sugar for number`{t}, and number has a Literal class constraint that rules out numbers that are too big for the type. For example:

Cryptol> (`t:[8]) where type t = 500

[error] at <interactive>:8:1--8:28:
  • Unsolvable constraint:
      8 >= 9
        arising from
        use of literal or demoted expression
        at <interactive>:8:2--8:4

The real issue here is the visual appearance of the syntax, and how it's a bit too easy for a human to misparse an expression with a backtick (`t*2/3 vs `(t*2/3)). As Levent says, the conversion syntax might just be too lightweight. As a prefix operator, there's no way to tell a priori the relative precedence with other operators.

One solution would be to simply remove the unary backtick notation completely, and require users to write number`{t} instead.

Another slightly less drastic solution would be to require parentheses, so `(t) instead of just `t. This makes the scope of the backtick explicit, so that humans won't misparse it.

A third solution would be to keep the existing syntax, but print a warning whenever anyone uses a naked backtick expression as the first argument to any infix operator. Users could then be encouraged to use parentheses to help disambiguate such cases; e.g. (`t)+3 or `(t)+3 would be preferred over `t+3.

msaaltink commented 3 years ago

It would not kill me to have to write number`{t} or even number`{t,[64]}, and it would be easier to look up "number" in the index, or to type :h number at the REPL, to get documentation.

robdockins commented 3 years ago

Well, you can do that right now, if you'd like to impose a "house" style to prefer explicit calls to number instead of using the magic backtick. I think it would be a pretty disruptive change to remove it, however.

brianhuffman commented 3 years ago

For a less disruptive change, I suppose we could add a :set-able option to enable some extra warnings.

GaloisInc / cryptol

Avoiding unintended truncation #1151