WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.41k stars 694 forks source link

New insn: Carry-less multiply (CLMUL) #1388

Open smilingthax opened 4 years ago

smilingthax commented 4 years ago

For implementing efficient algorithms, efficient bit-level operations are often important. While there are many bit-twiddling hacks for certain operations, others are not efficiently reducible to more common operations, while being easily implemented in hardware. WebAssembly already has some instructions of that kind, e.g. POPCNT.

One such operation, which has many uses as a fundamental building block, is the carry-less multiplication (CLMUL), which became part of many CPU instruction sets for that very reason. Its uses range from efficient, table-less CRC-computations, universal hash functions to encryption routines (AES!), i.e. when calculations in finite fields (e.g. GF(2^n)) are involved.

I propose to add this as a i32 and i64 variant. A consideration for multiplication also relevant for CLMUL is that the result of (e.g) i32 x i32 will use up to 64 bits (i.e. i64, or two i32).

There might be other interesting bit-ops that could be included in a proposal, e.g. PEXT / PDEP (parallel bits extract / deposit [aka. bit compress/expand, bit scatter/gather, bit pack/unpack]) seems to be very potent (used e.g. in chess engines: Bitboards), but at least AMD CPUs seem to execute them very slow compared to Intel, and software emulations are also expensive, thus my current focus is on the (uncontroversial) CLMUL only.

comex commented 4 years ago

What would a software emulation of this instruction look like for CPUs that don't have native support?

smilingthax commented 4 years ago

A partial implementation in JS:

function clmul32(a, b) {  // 32 x 32 -> 32  (does not compute high part, JS bitops are only 32 bit!)
  let ret = 0;
  for (let i = 0; i < 32; i++) {
    if (a << i) {
      ret ^= (b & (1 << i));
    }
  }
  return ret;
}