ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

The size limitation for CLFFT1D #1081

Closed ZhangErliang closed 6 months ago

ZhangErliang commented 6 months ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v23.11 Build options: {'Werror': '1', 'debug': '0', 'neon': '1', 'opencl': '1', 'os': 'linux', 'arch': 'armv8a', 'build': 'native'} Git hash=unknown

Platform: Mali GPU Operating System: Ubuntu

**Problem description: I am working on the FFT transform using CLFFT1D. When the length of data is small, e.g., 64 or smaller, it works well with g++. However, when I use a tensorshape (128U,1U,1U), it goes wrong. I cannot figure out the problem over days. Can you point out where the problem is hidden? I appreciate your help very much. The code is shown as below:

#ifndef ARM_COMPUTE_CL /* Needed by Utils.cpp to handle OpenCL exceptions properly */
#error "This example needs to be built with -DARM_COMPUTE_CL"
#endif /* ARM_COMPUTE_CL */

#include "arm_compute/core/Types.h"
#include "arm_compute/runtime/CL/CLTensor.h"
#include "arm_compute/runtime/CL/CLTensorAllocator.h"
#include "arm_compute/runtime/CL/CLTuner.h"
#include "utils/Utils.h"
#include "arm_compute/runtime/CL/functions/CLFFT1D.h"

using namespace std;
using namespace arm_compute;

 int N1 = 64;
 int N2 = 1;
 int N3 = 1;

int main()
{
    CLTuner tuner{};
    CLScheduler::get().default_init(&tuner);

    CLTensor  src;
    CLFFT1D   MyFFT;
    FFT1DInfo defaultFFTSetting;
    CLTensor  out;
    src.allocator()->init(TensorInfo(TensorShape(N1, N2, N3), 2, DataType::F32, DataLayout::NHWC));
    out.allocator()->init(TensorInfo(TensorShape(N1, N2, N3), 2, DataType::F32, DataLayout::NHWC));

    defaultFFTSetting.axis      = 0;
    defaultFFTSetting.direction = FFTDirection::Forward;

    std::cout << "xx: " << src.info()->tensor_shape()[defaultFFTSetting.axis] << endl;

    MyFFT.configure(&src, &out, defaultFFTSetting);
    src.allocator()->allocate();
    out.allocator()->allocate();

    src.map();
    Window src_window;
    src_window.use_tensor_dimensions(src.info()->tensor_shape());
    Iterator it_src(&src, src_window);
    execute_window_loop(
        src_window, [&](const Coordinates &id)
        {
            (*reinterpret_cast<float *>(it_src.ptr())) = static_cast<float>(1); 
        },
        it_src);
    std::cout << std::endl;

    src.unmap();
    MyFFT.run();
    CLScheduler::get().sync();

    out.map();
    Window out_window;
    out_window.use_tensor_dimensions(out.info()->tensor_shape());
    Iterator it_fft(&out, out_window);
    execute_window_loop(
        out_window, [&](const Coordinates &id)
        { std::cout << (id.y() * N1 + id.x()) << ": " 
                    << (*reinterpret_cast<float *>(it_fft.ptr())) << " + "
                    << (*reinterpret_cast<float *>(it_fft.ptr() + sizeof(float))) << "i" << endl; },
        it_fft);
    std::cout << std::endl;

    return 0;
}
ZhangErliang commented 6 months ago

@morgolock Could you please provide a prompt reply to point out where is the problem? Many thanks!

morgolock commented 6 months ago

Hi @ZhangErliang

I don't see anything wrong with the code you shared, I compiled and ran it without problems.

I also made changes to our validation tests for CLFFT1D using a shape TensorShape(512U, 1U, 1U) and there were no failures. You can try different shapes by making changes in https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/CL/FFT.cpp#L46

What is the output you were expecting?

ZhangErliang commented 6 months ago

Many thanks @morgolock! The reason may be the set of tuner at the beginning of the source code. When I deleted tuner setting, it works correctly.