dfinity / motoko

Simple high-level language for writing Internet Computer canisters
Apache License 2.0
517 stars 97 forks source link

Unexpected behavior with shift operators. #793

Closed enzoh closed 4 years ago

enzoh commented 5 years ago

I would expect both of these tests to pass. The first does, but the second does not.

 public func test1() : async () {
    let n : Word8 = 0x40;
    let i : Word8 = 7;
    let expect : Word8 = 0;
    let actual : Word8 = n >> i;
    assert (expect == actual);
  };
  public func test2() : async () {
    let n : Word8 = 0x80;
    let i : Word8 = 8;
    let expect : Word8 = 0;
    let actual : Word8 = n >> i;
    assert (expect == actual);
  };
enzoh commented 5 years ago

Dido for left shift.

  public func test3() : async () {
    let n : Word8 = 0x02;
    let i : Word8 = 7;
    let expect : Word8 = 0;
    let actual : Word8 = n << i;
    assert (expect == actual);
  };
  public func test4() : async () {
    let n : Word8 = 0x01;
    let i : Word8 = 8;
    let expect : Word8 = 0;
    let actual : Word8 = n << i;
    assert (expect == actual);
  };
crusso commented 5 years ago

This is one for @ggreif (removing myself from the assignees).

nomeata commented 5 years ago

Some discussion in https://dfinity.slack.com/archives/CCK4B0XE3/p1571746156090600?thread_ts=1571720802.082800&cid=CCK4B0XE3

ggreif commented 5 years ago

This is in direct correspondence with the Wasm shift operations. The bitsize of the argument to be shifted is the modulus for the shift amount. We need to have a chat with @rossberg is you think this is a bad default (which you obviously do).

ggreif commented 5 years ago

I am on vacation right now, feel free to discuss the issue here.

nomeata commented 5 years ago

I think this is not a strong argument. Wasm doesn have i8 or i16. The fact that the argument of the i32 shift operator is also a i32 could be interpreted as “needs to have same type”, but it could also just be “happens to be the ‘normal’ number type, which is i32”.

And maybe someone in that design process thought it’s cute (and simpler for the formal spec and validation code) if all binary operations share the property that all arguments and the result type are the same.

But we are not building a low-level language, but a safe language, and hence I’d strongly argue for the safe variant here (even if it costs some performance, which we may be able to optimize away).

I guess in AS we so far also tried to “lets make all operations type-homogenous”… but I never bought this. There are plenty of operations that are naturally not homogenous.

@rossberg, can you make a judgment calls, should it be

x : WordN    n : Nat
────────────────────────────
x >> n : WordN

or

x : WordN    n : WordN
────────────────────────────
x >> n : WordN

or (maybe a compromise: no bigints needed, so potentially faster, but still protects against overflows)

x : WordN    n : Nat8
────────────────────────────
x >> n : WordN
nomeata commented 5 years ago

Some incomplete language research:

rossberg commented 5 years ago

Wasm uses an i64 shift for i64, so IIUC, our current design for other types is consistent with that. Wasm consciously doesn't have a notion of "default" int type.

For Motoko you could argue that Nat would be the natural type here. I don't feel strongly, but making shifts into ordinary binary ops seems somewhat simpler. Does it matter in practice?

nomeata commented 5 years ago

For Motoko you could argue that Nat would be the natural type here. I don't feel strongly, but making shifts into ordinary binary ops seems somewhat simpler.

Nobody argues against using binary ops, just to change the typing rule for it?

Does it matter in practice?

It confused the hell out of @enzoh it seems…

nomeata commented 5 years ago

Wait, now I am confused what the issue is. Maybe I was barking up the wrong tree. will look after lunch.

rossberg commented 5 years ago

By "ordinary binary op" I meant one with homogeneous type.

But I don't see how typing is related to the actual error report. If one of the examples doesn't pass then hat seems like a genuine bug. It's a bit worrisome that neither our test suite nor our fuzzers caught this?

nomeata commented 5 years ago

Yes, I was confused. The problem was not an implicit mod 256 due to the word size, the problem seems to be an implicit mod 8 due to “The bitsize of the argument to be shifted is the modulus for the shift amount.”, as Gabor says. Indeed independent of the type. Sorry for that confusion.

Gabor seems to have derived that directly from the Wasm spec, https://webassembly.github.io/spec/core/exec/numerics.html#op-ishl says “Let k be i₂ modulo N.” where N in this case is the bit width.

So Gabor (justifiably) generalized the Wasm behavior for 32 and 64 bit to 16 and 8, and Enzo was surprised by that behavior. So the judgment call we need to make is: Should we follow the WebAssembly precedent here, or should we hide that from the user, and don't take the shift amound modulo anything?

rossberg commented 5 years ago

I see. Of course, we should be consistent across sizes. I believe the Wasm behaviour was picked because it was the most efficient on average hardware. It would require an extra compare-and-branch around all non-constant shifts to achieve the non-modulo semantics for 32/64 bits. Is that cheap enough?

nomeata commented 5 years ago

Do we care about that kind of performance, over ~correctness~ less foot-guns?

rossberg commented 5 years ago

Not sure. Presumably, a (the?) common reason why you'd resort to bit fiddling is performance.

OCaml:

# Int32.shift_left 1l 32;;
- : int32 = 1l

Don't know how to write that in Haskell. Pretty sure it's undefined behaviour in C.

nomeata commented 5 years ago

Interesting, I didn’t expect that. I must admit ignorance.

Prelude Data.Bits Data.Word> shiftL 1 1 :: Word32
2
Prelude Data.Bits Data.Word> shiftL 1 31 :: Word32
2147483648
Prelude Data.Bits Data.Word> shiftL 1 32 :: Word32
0
Prelude Data.Bits Data.Word> shiftL 1 (2^32) :: Word32
0
Prelude Data.Bits Data.Word> shiftL 1 (2^64) :: Word32
1

I refrain from making a recommendation now. There is precedent in either way. I’d find Nat without any modulos most intuitive, but if people from other backgroud expect the mod-bitwidth behavior, then that is a good reason to with what Wasm suggests.

nomeata commented 5 years ago

Whatever we decide here might influence #421 (the second argument of **, where similar issues apply.)

ggreif commented 5 years ago

Loosely related, but I'll just leave this here for (non-)amusement: https://bugs.chromium.org/p/nativeclient/issues/detail?id=245

rossberg commented 4 years ago

Anybody opposed to closing this? Given that there is precedence either way, it seems best to stick to the native Wasm semantics.