Open smilingthax opened 4 years ago
What would a software emulation of this instruction look like for CPUs that don't have native support?
A partial implementation in JS:
function clmul32(a, b) { // 32 x 32 -> 32 (does not compute high part, JS bitops are only 32 bit!)
let ret = 0;
for (let i = 0; i < 32; i++) {
if (a << i) {
ret ^= (b & (1 << i));
}
}
return ret;
}
For implementing efficient algorithms, efficient bit-level operations are often important. While there are many bit-twiddling hacks for certain operations, others are not efficiently reducible to more common operations, while being easily implemented in hardware. WebAssembly already has some instructions of that kind, e.g. POPCNT.
One such operation, which has many uses as a fundamental building block, is the carry-less multiplication (CLMUL), which became part of many CPU instruction sets for that very reason. Its uses range from efficient, table-less CRC-computations, universal hash functions to encryption routines (AES!), i.e. when calculations in finite fields (e.g. GF(2^n)) are involved.
I propose to add this as a i32 and i64 variant. A consideration for multiplication also relevant for CLMUL is that the result of (e.g) i32 x i32 will use up to 64 bits (i.e. i64, or two i32).
There might be other interesting bit-ops that could be included in a proposal, e.g. PEXT / PDEP (parallel bits extract / deposit [aka. bit compress/expand, bit scatter/gather, bit pack/unpack]) seems to be very potent (used e.g. in chess engines: Bitboards), but at least AMD CPUs seem to execute them very slow compared to Intel, and software emulations are also expensive, thus my current focus is on the (uncontroversial) CLMUL only.