Wrong vector addition result inside a Cuda kernel.

The current master version (a2844eed) produces the wrong result in the kernel:

#include <glm/vec4.hpp>
#include <stdio.h>

__global__ void foo()
{
    glm::vec4 a{1.0f, 1.0f, 1.0f, 1.0f};
    glm::vec4 b{2.0f, 2.0f, 2.0f, 2.0f};
    glm::vec4 c = a + b;

    printf("%f %f %f %f\n%f %f %f %f\n%f %f %f %f\n%f %f %f %f\n\n", //
           a.x, a.y, a.z, a.w,                                       //
           b.x, b.y, b.z, b.w,                                       //
           a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w,               //
           c.x, c.y, c.z, c.w);
}

int main()
{
    foo<<<1, 1>>>();

    cudaDeviceSynchronize();

    return 0;
}

The result is:

1.000000 1.000000 1.000000 1.000000
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
1.000000 1.000000 1.000000 0.000000

The tag 1.0.1 version works as expected, i.e. the output is:

1.000000 1.000000 1.000000 1.000000
2.000000 2.000000 2.000000 2.000000
3.000000 3.000000 3.000000 3.000000
3.000000 3.000000 3.000000 3.000000


OS	Ubuntu 20.04
nvcc --version	Cuda compilation tools, release 11.4, V11.4.315
nvidia-smi	Driver Version: 535.171.04 CUDA Version: 12.2

The same issue is reproduced on my colleagues machines with different OSes/drivers.

g-truc / glm

Wrong vector addition result inside a Cuda kernel. #1288