DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.52k stars 91 forks source link

Large errors using Intel OpenCL for CPUs #148

Open peastman opened 8 months ago

peastman commented 8 months ago

When running VkFFT under Intel's OpenCL for CPUs, I find that it gives very large errors. The results aren't totally wrong, but the accuracy is much worse than I expect.

The following C++ function illustrates the problem. It fills an array with random values, performs forward and backward 3D FFTs, and compares the output to the input. On most OpenCL implementations I've tested the agreement is excellent. For example on NVIDIA OpenCL the maximum error is 7.59959e-07. But on Intel it is hundreds of times larger: 0.000337124.

Any idea what could be causing this? Thank you for your help!

void test(cl::Device device, cl::Context context) {
    // Initialize VkFFT.

    int xsize = 25, ysize = 25, zsize = 25;
    VkFFTApplication app;
    app = {};
    VkFFTConfiguration config = {};
    config.FFTdim = 3;
    config.size[0] = zsize;
    config.size[1] = ysize;
    config.size[2] = xsize;
    config.device = &device();
    config.context = &context();
    config.inputBufferStride[0] = zsize;
    config.inputBufferStride[1] = ysize*zsize;
    config.inputBufferStride[2] = xsize*ysize*zsize;
    VkFFTResult result = initializeVkFFT(&app, config);
    cl::CommandQueue queue(context, device);

    // Generate the input data.

    default_random_engine generator;
    uniform_real_distribution<float> distribution(0.0, 1.0);
    vector<float> input(2*xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        input[i] = distribution(generator);
    int bufferSize = input.size()*sizeof(float);
    cl::Buffer buffer(context, CL_MEM_READ_WRITE, bufferSize);
    queue.enqueueWriteBuffer(buffer, CL_TRUE, 0, bufferSize, input.data());

    // Perform the FFTs.

    VkFFTLaunchParams params = {};
    params.inputBuffer = &buffer();
    params.buffer = &buffer();
    params.commandQueue = &queue();
    result = VkFFTAppend(&app, -1, &params);
    result = VkFFTAppend(&app, 1, &params);

    // Check the result.

    vector<float> output(input.size());
    queue.enqueueReadBuffer(buffer, CL_TRUE, 0, bufferSize, output.data());
    float maxError = 0;
    float scale = 1.0/(xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        maxError = max(maxError, fabs(input[i]-scale*output[i]));
    printf("%g\n", maxError);
}
DTolm commented 8 months ago

Hello,

I have not yet managed to make this runtime work with my machine, but this issue resembles the low accuracy of sincos functions on Intel iGPUs. There is a fix for intel vendor id that it forces to precompute all twiddle factors in LUT there, my guess would be that vendor id is different in this case and the fix is not applied. Can you try setting config.useLUT = 1 and see if this fixes the issue?

Best regards, Dmitrii

peastman commented 8 months ago

That fixes it. Thank you so much!

DTolm commented 8 months ago

Great! However, I am not sure how to make this permanent as this is an issue of this particular specification.

peastman commented 8 months ago

Would it make sense to enable useLUT by default whenever the platform vendor is Intel, whatever type of device it is?

DTolm commented 8 months ago

It already is on by default for vendor 0x8086 (Intel) though. Can you check the vendorID value for your device? The command is clGetDeviceInfo(device, CL_DEVICE_VENDOR_ID, sizeof(cl_int), &vendorID, 0);

peastman commented 8 months ago

How very clever of them! :)

It's a little odd. They actually create two different platforms, each with a single device. The first one is called "Intel(R) FPGA Emulation Platform for OpenCL(TM)" and the device has vendor ID 0x1172. The second one is called "Intel(R) OpenCL" and the device has vendor ID 0x8086. The platform vendor for both of them is "Intel(R) Corporation".

DTolm commented 8 months ago

This vendor id apparently belongs to Altera Corporation. I guess I can add a special check for CL_PLATFORM_VENDOR to be Intel(R) Corporation as it is the common factor.

peastman commented 8 months ago

Altera is now part of Intel. They bought it some years back.