Add interleave instruction

Dougall and I determined the encoding by symmetry with other bit scan instructions (ffs, bitrev, and nebulously popcount). To confirm the behaviour, I hacked the compiler to replace imul with interleave, and then wrote the following shader-runner test to check every pair of 16-bit inputs, which passes in under a second on M1:

[require]
GL ES >= 3.1
GLSL ES >= 3.10

[compute shader]
#version 310 es

layout (local_size_x = 32, local_size_y = 1) in;
layout(binding = 0) uniform atomic_uint good;
layout(binding = 0) uniform atomic_uint bad;

uint reference(uint x, uint y) {
    uint z = 0u;
    for (uint i = 0u; i < 16u; ++i) {
        z |= ((x & (1u << i)) << i);
        z |= ((y & (1u << i)) << (i + 1u));
    }
    return z;
}

uint result(uint x, uint y) {
    /* overloaded */
    return x * y;
}

void main (void)
{
    uint x = uint(gl_GlobalInvocationID.x);
    bool allOk = true;

    for (uint y = 0u; y < 65536u; ++y) {
        if ((reference(x, y) != result(x, y)))
            allOk = false;
    }

    if (allOk)
        atomicCounterIncrement(good);
    else
        atomicCounterIncrement(bad);
}

[test]
atomic counters 2

compute 2048 1 1

probe atomic counter 0 == 65536
probe atomic counter 1 == 0

Signed-off-by: Alyssa Rosenzweig alyssa@rosenzweig.io

dougallj / applegpu

Add interleave instruction #43