NESlice doesn't match specification when end coordinate(s) is negative

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.08 Build options: {'Werror': '1', 'debug': '1', 'neon': '1', 'opencl': '0', 'os': 'linux', 'arch': 'armv8a'} Git hash=b'aabef6c0584f06f4c0f4b61fb787d80374240619'

Platform/Operating System: qemu-aarch64 version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.5) on Ubuntu 22.04 LTS

Problem description:

Accordingly to specification of NESlice: End coordinates can be negative, which represents the number of elements before the end of that dimension. This behavior is aligned with both popular AI frameworks like _TensorFlow_, PyTorch and NumPy. For example,

>>> import tensorflow as tf
>>> t = tf.constant([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
tf.Tensor([2 3 4 5], shape=(4,), dtype=int32)

NESlice.patch.txt

>>> import torch.nn as nn
>>> t = torch.tensor([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
tensor([2, 3, 4, 5])

>>> import numpy as np
>>> t =  np.array([1, 2, 3, 4, 5, 6])
>>> print(t[1:-1])
[2 3 4 5]

So, for given tensor input: { 6 } starts: { 1 } ends: { -1 } we expect output: { 4 }. Actually we got validation failure since NESlice::validate expects output: { 5 }. This is because negative value at ends coordinate(s) is interpreted as a non-zero bit(s) at _endmask passed to NEStridedSliceKernel what, in turn, is interpreted as "ends[i] is ignored and the fullest possible range in that dimension is used".

Details: NESlice working though NEStridedSliceKernel and calculates _exp_outputshape in this way NESlice::validate => NEStridedSliceKernel::validate => validate_arguments => compute_strided_slice_shape => compute_strided_slice_output_shape => calculate_end_on_index =>

// Reset in case of begin mask present
if(arm_compute::helpers::bit_ops::is_bit_set(end_mask, index) && !shrink_axis)
{
    stop = (stride > 0) ? std::numeric_limits<int>::max() : std::numeric_limits<int>::lowest();
}

<skip>

// Final clamp
stop = (stride > 0) ? utility::clamp(stop, 0, dim_size) : utility::clamp(stop, -1, dim_size - 1);

So, stop is calculated as input's full range and this matches specification of NEStridedSliceKernel. The thing is the way how NESlice generates _endmask by construct_slice_end_mask function to be passed to NEStridedSliceKernel:

int32_t construct_slice_end_mask(Coordinates ends)
{
    // Create end mask
    int32_t end_mask = 0;
    for(unsigned int i = 0; i < ends.num_dimensions(); ++i)
    {
        if(ends[i] < 0)
        {
            end_mask |= 1 << i;
        }
    }

    return end_mask;
}

Finally, we either need to fix NESlice behavior or make a correction in the documentation to describe actual one. A fix may be quite quick and simple - just get rid off end_mask calculation and pass zero as _endmask to NEStridedSliceKernel:: validate/configure, see attached patch.

Reproducer (slice.cpp, no test case w/ negative value at ends found at tests/validation/NEON/Slice.cpp):

#include "arm_compute/core/Types.h"
#include "arm_compute/runtime/NEON/functions/NESlice.h"
#include "arm_compute/runtime/Tensor.h"

using namespace arm_compute;

int main()
{
        auto test = [](TensorInfo input, TensorInfo output, Coordinates starts, Coordinates ends)
        { return NESlice::validate(&input, &output, starts, ends); };

        //Passed
        assert(bool(test(
                TensorInfo(TensorShape(6U), 1, DataType::F32),
                TensorInfo(TensorShape(4U), 1, DataType::F32),
                Coordinates( 1), Coordinates(5)
        )));

        //Failed
        assert(bool(test(
                TensorInfo(TensorShape(6U), 1, DataType::F32),
                TensorInfo(TensorShape(4U), 1, DataType::F32),
                Coordinates( 1), Coordinates(-1)
        )));
}

aarch64-linux-gnu-g++ slice.cpp -g -O0 -I <arm_compute-repo> -I <arm_compute-repo>/include -L <arm_compute-repo>/build -o slice -larm_compute-static

ARM-software / ComputeLibrary

NESlice doesn't match specification when end coordinate(s) is negative #1007