ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

NESlice doesn't match specification when end coordinate(s) is negative #1007

Closed dgurulev closed 4 months ago

dgurulev commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.08 Build options: {'Werror': '1', 'debug': '1', 'neon': '1', 'opencl': '0', 'os': 'linux', 'arch': 'armv8a'} Git hash=b'aabef6c0584f06f4c0f4b61fb787d80374240619'

Platform/Operating System: qemu-aarch64 version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.5) on Ubuntu 22.04 LTS

Problem description:

Accordingly to specification of NESlice: End coordinates can be negative, which represents the number of elements before the end of that dimension. This behavior is aligned with both popular AI frameworks like _TensorFlow_, PyTorch and NumPy. For example,

>>> import tensorflow as tf
>>> t = tf.constant([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
tf.Tensor([2 3 4 5], shape=(4,), dtype=int32)

NESlice.patch.txt

>>> import torch.nn as nn
>>> t = torch.tensor([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
tensor([2, 3, 4, 5])
>>> import numpy as np
>>> t =  np.array([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
[2 3 4 5]

So, for given tensor input: { 6 } starts: { 1 } ends: { -1 } we expect output: { 4 }. Actually we got validation failure since NESlice::validate expects output: { 5 }. This is because negative value at ends coordinate(s) is interpreted as a non-zero bit(s) at _endmask passed to NEStridedSliceKernel what, in turn, is interpreted as "ends[i] is ignored and the fullest possible range in that dimension is used".

Details: NESlice working though NEStridedSliceKernel and calculates _exp_outputshape in this way NESlice::validate => NEStridedSliceKernel::validate => validate_arguments => compute_strided_slice_shape => compute_strided_slice_output_shape => calculate_end_on_index =>

// Reset in case of begin mask present
if(arm_compute::helpers::bit_ops::is_bit_set(end_mask, index) && !shrink_axis)
{
    stop = (stride > 0) ? std::numeric_limits<int>::max() : std::numeric_limits<int>::lowest();
}

<skip>

// Final clamp
stop = (stride > 0) ? utility::clamp(stop, 0, dim_size) : utility::clamp(stop, -1, dim_size - 1);

So, stop is calculated as input's full range and this matches specification of NEStridedSliceKernel. The thing is the way how NESlice generates _endmask by construct_slice_end_mask function to be passed to NEStridedSliceKernel:

int32_t construct_slice_end_mask(Coordinates ends)
{
    // Create end mask
    int32_t end_mask = 0;
    for(unsigned int i = 0; i < ends.num_dimensions(); ++i)
    {
        if(ends[i] < 0)
        {
            end_mask |= 1 << i;
        }
    }

    return end_mask;
}

Finally, we either need to fix NESlice behavior or make a correction in the documentation to describe actual one. A fix may be quite quick and simple - just get rid off end_mask calculation and pass zero as _endmask to NEStridedSliceKernel:: validate/configure, see attached patch.

Reproducer (slice.cpp, no test case w/ negative value at ends found at tests/validation/NEON/Slice.cpp):

#include "arm_compute/core/Types.h"
#include "arm_compute/runtime/NEON/functions/NESlice.h"
#include "arm_compute/runtime/Tensor.h"

using namespace arm_compute;

int main()
{
        auto test = [](TensorInfo input, TensorInfo output, Coordinates starts, Coordinates ends)
        { return NESlice::validate(&input, &output, starts, ends); };

        //Passed
        assert(bool(test(
                TensorInfo(TensorShape(6U), 1, DataType::F32),
                TensorInfo(TensorShape(4U), 1, DataType::F32),
                Coordinates( 1), Coordinates(5)
        )));

        //Failed
        assert(bool(test(
                TensorInfo(TensorShape(6U), 1, DataType::F32),
                TensorInfo(TensorShape(4U), 1, DataType::F32),
                Coordinates( 1), Coordinates(-1)
        )));
}
aarch64-linux-gnu-g++ slice.cpp -g -O0 -I <arm_compute-repo> -I <arm_compute-repo>/include -L <arm_compute-repo>/build -o slice -larm_compute-static
morgolock commented 5 months ago

Hi @dgurulev

Thanks for reporting this.

Could you please submit your patch for review, please see the section How to submit a patch in https://arm-software.github.io/ComputeLibrary/latest/contribution_guidelines.xhtml