The size limitation for CLFFT1D

ZhangErliang commented 11 months ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v23.11 Build options: {'Werror': '1', 'debug': '0', 'neon': '1', 'opencl': '1', 'os': 'linux', 'arch': 'armv8a', 'build': 'native'} Git hash=unknown

Platform: Mali GPU Operating System: Ubuntu

**Problem description: I am working on the FFT transform using CLFFT1D. When the length of data is small, e.g., 64 or smaller, it works well with g++. However, when I use a tensorshape (128U,1U,1U), it goes wrong. I cannot figure out the problem over days. Can you point out where the problem is hidden? I appreciate your help very much. The code is shown as below:

#ifndef ARM_COMPUTE_CL /* Needed by Utils.cpp to handle OpenCL exceptions properly */
#error "This example needs to be built with -DARM_COMPUTE_CL"
#endif /* ARM_COMPUTE_CL */

#include "arm_compute/core/Types.h"
#include "arm_compute/runtime/CL/CLTensor.h"
#include "arm_compute/runtime/CL/CLTensorAllocator.h"
#include "arm_compute/runtime/CL/CLTuner.h"
#include "utils/Utils.h"
#include "arm_compute/runtime/CL/functions/CLFFT1D.h"

using namespace std;
using namespace arm_compute;

 int N1 = 64;
 int N2 = 1;
 int N3 = 1;

int main()
{
    CLTuner tuner{};
    CLScheduler::get().default_init(&tuner);

    CLTensor  src;
    CLFFT1D   MyFFT;
    FFT1DInfo defaultFFTSetting;
    CLTensor  out;
    src.allocator()->init(TensorInfo(TensorShape(N1, N2, N3), 2, DataType::F32, DataLayout::NHWC));
    out.allocator()->init(TensorInfo(TensorShape(N1, N2, N3), 2, DataType::F32, DataLayout::NHWC));

    defaultFFTSetting.axis      = 0;
    defaultFFTSetting.direction = FFTDirection::Forward;

    std::cout << "xx: " << src.info()->tensor_shape()[defaultFFTSetting.axis] << endl;

    MyFFT.configure(&src, &out, defaultFFTSetting);
    src.allocator()->allocate();
    out.allocator()->allocate();

    src.map();
    Window src_window;
    src_window.use_tensor_dimensions(src.info()->tensor_shape());
    Iterator it_src(&src, src_window);
    execute_window_loop(
        src_window, [&](const Coordinates &id)
        {
            (*reinterpret_cast<float *>(it_src.ptr())) = static_cast<float>(1); 
        },
        it_src);
    std::cout << std::endl;

    src.unmap();
    MyFFT.run();
    CLScheduler::get().sync();

    out.map();
    Window out_window;
    out_window.use_tensor_dimensions(out.info()->tensor_shape());
    Iterator it_fft(&out, out_window);
    execute_window_loop(
        out_window, [&](const Coordinates &id)
        { std::cout << (id.y() * N1 + id.x()) << ": " 
                    << (*reinterpret_cast<float *>(it_fft.ptr())) << " + "
                    << (*reinterpret_cast<float *>(it_fft.ptr() + sizeof(float))) << "i" << endl; },
        it_fft);
    std::cout << std::endl;

    return 0;
}

ZhangErliang commented 11 months ago

@morgolock Could you please provide a prompt reply to point out where is the problem? Many thanks!

morgolock commented 11 months ago

Hi @ZhangErliang

I don't see anything wrong with the code you shared, I compiled and ran it without problems.

I also made changes to our validation tests for CLFFT1D using a shape TensorShape(512U, 1U, 1U) and there were no failures. You can try different shapes by making changes in https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/CL/FFT.cpp#L46

What is the output you were expecting?

ZhangErliang commented 11 months ago

Many thanks @morgolock! The reason may be the set of tuner at the beginning of the source code. When I deleted tuner setting, it works correctly.

ARM-software / ComputeLibrary

The size limitation for CLFFT1D #1081