Semantics of signed integer divide and remainder

sunfishcode commented 9 years ago

MMIX signed integer division rounds using floor rouding, and signed integer remainder uses the sign of the denominator. This makes the signed integer remainder operation a true modulus operation. And, it makes division by a power of 2 equivalent to an arithmetic right shift. Knuth calls attention to these advantages in his introduction to MMIX (page 23).

However, Fortran standardized on having signed integer division truncate a long time ago, in the days when one's complement and sign-magnitude machines were common (and in that context, truncating is more natural), and since then most hardware has kept that behavior (even as we've transitioned to two's complement everywhere). While C held out for a while, the official C99 Rationale describes C making its decision to be compatible with Fortran.

The main theoretical argument today for truncation seems to be that it's more consistent with float-to-integer conversions, which also truncate. To the extent that JavaScript has an opinion, integer division is said to truncate for this reason. However, integer is different from floating point, and having consistency on this point isn't essential.

Mathematically-minded people seem to almost universally prefer floor over truncate for integer division. Python is one of several languages which choose floor, they have a nice explanation of this decision (the comments also include some gems from Tim Peters).

Guy Steele once got so frustrated with the inconsistencies caused by truncating division that he proposed hardware designers omit or alter the right shift instruction, though in that paper he also puts forward the idea of moving away from the Fortran definition of division as a reasonable approach as well.

Both behaviors can be implemented in terms of the other, so it's a question of preferences and efficiencies. For example, here is a reasonably optimized implementation of floor division on x86:

        movl       %edi, %eax
        cltd
        idivl        %esi
        movl       %edx, %ecx
        negl        %ecx
        testl        %esi, %esi
        cmovns   %edx, %ecx
        shrl          $31, %ecx
        subl         %ecx, %eax

This is only about 20% slower than a plain cltd+idivl on an Ivy Bridge I just tested it on, because the idivl is the slowest part. For 64-bit division the corresponding sequence is only 9% slower (idivq is slower than idivl). Also note that divisions by immediate constant can always be optimized in other ways, so this overhead need only apply to divisions by non-immediates.

So, what should WebAssembly do?

floor: the mathematically preferred way, according to Knuth and numerous others
truncate: the Fortran, C/C++, and ultimately JS way, and the way that makes signed division by non-constant more efficient on today's hardware

I haven't yet formed an opinion myself; I just wanted to write this up first so that it can be considered.

jfbastien commented 9 years ago

Porque no los dos?

If enough usecases lead us to believe that some languages will want one behavior, and others will want the other, then we could have separate idiv instructions (or a mode attribute on idiv).

sunfishcode commented 9 years ago

I'm open to that.

One other downside that I didn't mention above is that if we only have floor division, C/C++/Fortran will need to emulate truncating division on top of that, leading to double emulation overhead. That does make it more desirable to have the C/C++/Fortran version.

qwertie commented 9 years ago

Hardware follows software, and vice versa: hardware people design chips for what software does or wants to do, while software people design software for what hardware supports well. If all the math and CS people agree that an operation should use rule X, but most chips use Y for historical reasons, I think the right thing to do is support both X and Y in WebAssembly. X for future-proofing and to give hardware designers a reason to support X, and Y so that code can run fast today.

I wonder, when we're talking about which operations to "support" in the MVP, are we talking about actually implementing things properly, or merely having something that runs? Like if we're talking about a CLZ or Log2Floor, are we talking about mapping it to a machine instruction (BSR), or merely having some kind of implementation available (in ES or asm.js)?

jfbastien commented 9 years ago

We aim to ship as-close-to-native speed as possible, portably. This means that even without hardware support for an exact operation we have to be reasonably sure that an efficient software implementation is possible. In some cases it's also about allowing the compiler to figure out which is the best software implementation, e.g. memset is best implemented when full μarch knowledge is available (and peepholing it is often ill-advised).

WebAssembly may one day influence hardware design, but that's a very slow process, and it's fairly complicated. A few of us have worked on hardware and/or gotten hardware features added :-)

titzer commented 9 years ago

I agree with JF; an integer division operation that does not map to existing hardware instructions is a non-starter.

On Tue, Jul 7, 2015 at 7:17 AM, JF Bastien notifications@github.com wrote:

We aim to ship as-close-to-native speed as possible, portably. This means that even without hardware support for an exact operation we have to be reasonably sure that an efficient software implementation is possible. In some cases it's also about allowing the compiler to figure out which is the best software implementation, e.g. memset is best implemented when full μarch knowledge is available (and peepholing it is often ill-advised).

WebAssembly may one day influence hardware design, but that's a very slow process, and it's fairly complicated. A few of us have worked on hardware and/or gotten hardware features added :-)

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/250#issuecomment-119075275.

sunfishcode commented 9 years ago

The double emulation for C/C++ does make this particularly difficult to consider doing exclusively. Ah well.

sunfishcode commented 9 years ago

This is added to FutureFeatures.md in #347.

WebAssembly / design

Semantics of signed integer divide and remainder #250