ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
2.87k stars 783 forks source link

Depthwise convolution operator doesn't support shapes with dilation #1051

Closed snadampal closed 1 year ago

snadampal commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v23.02.1 Build options: {'Werror': '1', 'debug': '0', 'neon': '1', 'opencl': '0', 'os': 'linux', 'openmp': '1', 'cppthreads': '0', 'arch': 'armv8.2-a', 'multi_isa': '1', 'build': 'native'} Git hash=b'd8bf9b53752a4f573120cf51b31055de8b3c7d29'

Platform: AWS Graviton3

Operating System: Ubuntu 20.04

Problem description: Convolution shapes with dilation are not supported by ACL gemm kernels. eg: g512mb1_ic512oc512_ih1oh1kh1sh1dh0ph0_iw104ow104kw87sw1dw1pw86

The shape is processed by DepthwiseConv2d optimized kernel if dilation is set to zero. eg: g512mb1_ic512oc512_ih1oh1kh1sh1dh0ph0_iw104ow104kw87sw1dw0pw86

ACL is rejecting the shape because the destination tensor dimensions are not matching for the dilated one vs the original. here is the check: https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/kernels/internal/CpuDepthwiseConv2dAssemblyWrapperKernel.cpp#L298

 if(dst->total_size() > 0)
    {
            const TensorShape dst_shape = misc::shape_calculator::compute_depthwise_convolution_shape(*src, *weights, info);
        ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DIMENSIONS(dst->tensor_shape(), dst_shape);
        ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(src, dst);
    }

resulted from https://github.com/ARM-software/ComputeLibrary/blob/main/arm_compute/core/Validate.h#LL51C11-L51C11

{
    for(unsigned int i = upper_dim; i < arm_compute::Dimensions<T>::num_max_dimensions; ++i)
    {
        if(dim1[i] != dim2[i])
        {
            return true;
        }
    }

the dilation is increasing the dimensions and hence not matching to the original destination.

code to reproduce the issue: I have used 'benchdnn' utility from oneDNN

export DNNL_VERBOSE=1
/oneDNN/build/tests/benchdnn$ ./benchdnn --conv --stag=any --dtag=any --wtag=any g512mb1_ic512oc512_ih1oh1kh1sh1dh0ph0_iw104ow104kw87sw1dw1pw86

Questions

  1. Is the dilation not supported in ACL kernels?
  2. or does it need a different configuration?
  3. what other options do I have to be able to process these shapes with optimized ACL kernels.
morgolock commented 1 year ago

Hi @snadampal

Dilation is supported in DWC in the Neon backend.

You have to specify the dilation with the argument const ConvolutionInfo &info in https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuDepthwiseConv2d.h#L59

See

2284 struct ConvolutionInfo
2285 {
2286     ConvolutionInfo() = default;
2287     ConvolutionInfo(const PadStrideInfo &pad_stride_info, unsigned int depth_multiplier, const ActivationLayerInfo &act_info, const Size2D &dilation)
2288         : pad_stride_info(pad_stride_info), depth_multiplier(depth_multiplier), act_info(act_info), dilation(dilation)
2289     {
2290     }
2291     PadStrideInfo       pad_stride_info{};        /**< Convolution info (Pads, strides,...) */
2292     unsigned int        depth_multiplier{ 1 };    /**< Multiplier to apply to input's depth to retrieve the output depth. Defaults to 1 */
2293     ActivationLayerInfo act_info{};               /**< Fused activation to apply after convolution. */
2294     Size2D              dilation{ Size2D(1, 1) }; /**< Dilation, in elements, across x and y. Defaults to (1, 1). */
2295 };

We recently merged a patch to main which was not included in 23.02 that enables dilation in the assembly kernels, please have a look at it: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8919

Hope this helps.

snadampal commented 1 year ago

@morgolock , Thanks for the prompt response and the details! I will give it a try.

snadampal commented 1 year ago

Hi @morgolock , I'm able to use that patch and got the shapes, with dilation, working with ACL depthwise convolution kernels. I had to set the dilation info properly during the validate and configure calls. I'm closing the issue now. Thanks you!