ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

Pure image processing operations #1068

Closed ismailkocdemir closed 8 months ago

ismailkocdemir commented 10 months ago

Hello,

I need NEON accelerated simple image processing operations such as resize, gaussian/box blur, and basic morphology (dilation erosion etc). I managed to use NEScale for resizing (would have been nicer if bicubic interpolation was supported).

As far as I can see, for blurring, my only option is to use NEGEMMConv2d (combine it with logical & element wise ops for morphology). NEGEMMConv2D has extra bias parameter, I might work around that but with bringing extra bias addition cost. I wanted to ask for advice if this is the right path or maybe if there already exists support for such operations (I've seen older issues mentioning gaussian blur support, maybe I could not find them in the latest release 23.08).

Thanks in advance

ismailkocdemir commented 10 months ago

I have a follow up question about dilation in an older version (21.02) if that's ok @morgolock

I have the following method but it gives the wrong output on a simple case (input is 8x8 all zeros, but a single pixel in the center. I'm expecting a dilated 3x3 region around it):


namespace armc = arm_compute;

void dilate(
        uint8_t* input_image,
        uint8_t* output_image,
        int height, 
        int width
)
{
    armc::Image inp;
    armc::Image out;

    bool import = true;
    if(import)
   {
        inp.allocator()->info().init(
            armc::TensorShape(width, height),
            armc::Format::U8, 
            armc::Strides(1, width), 
            0, 
            width * height
        );
        out.allocator()->info().init(
            armc::TensorShape(width, height),
            armc::Format::U8, 
            armc::Strides(1, width), 
            0, 
            width * height
        );
        inp.allocator()->import_memory(input_image);
        out.allocator()->import_memory(output_image);
        }
    else
    {
        auto inf = armc::TensorInfo(width, height, armc::Format::U8);
        inp.allocator()->init(inf); 
        out.allocator()->init(inf);
    }

    armc::NEDilate mop{};

    mop.configure(&inp, &out, armc::BorderMode::UNDEFINED);
    if(!import)
    {
          inp.allocator()->allocate();
          out.allocator()->allocate();
          fill_arm_image(inp, input_image, width, height);
    }

     mop.run();
     if(!import) 
          copy_from_arm_image(out, output_image, width, height);

This results in all zeros in output array (no change, stays as initialized) when I import from external buffer. What I expect it to do is to use no padding in the input and shrink the execution window (following the guide here ) Is there anything I'm missing?

It works fine if I allocate the memory and fill it with my data after the configuration step.

Thanks again!

morgolock commented 9 months ago

Hi @ismailkocdemir

We removed the CV functions from ACL in the release v21.05, the focus of the library now is machine learning. We do not maintain or support old versions of the library. The best alternative is to use a different library like OpenCV, you can cross-compile it for aarch64 and has good performance.

Regarding the example you shared about calling to import_memory(), I think there may be a problem in the way you initialize the tensor info. There should be no need for you to specify the strides, you can see how it's done in https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/NEON/UNIT/TensorAllocator.cpp#L59

TensorInfo info(TensorShape(24U, 16U, 3U), 1, DataType::F32);

    // Allocate memory buffer
    const size_t total_size = info.total_size();
    auto         data       = std::make_unique<uint8_t[]>(total_size);

    // Negative case : Import nullptr
    Tensor t1;
    t1.allocator()->init(info);
    ARM_COMPUTE_ASSERT(!bool(t1.allocator()->import_memory(nullptr)));
    ARM_COMPUTE_ASSERT(t1.info()->is_resizable());

If you don't import the memory and allocate it instead you get the correct results?

Hope this helps,