ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
2.87k stars 782 forks source link

Extend RNNLayer #1014

Closed allnes closed 9 months ago

allnes commented 1 year ago

Hi!

My name is Nesterov Alexander and I developer of OpenVINO Toolkit (contribution part). We use ComputeLibrary(ACL) as a backend for our ARM plugin.

We try map yours layers to our operations. And now we try use ACL RNNLayer with OpenVINO RNNCell. We get limitation for RNNLayer depend on clip parameter (reference of OpenVINO operation).

Do you have a plan for extend this operation?

Cc @alvoron Thank you!

allnes commented 1 year ago

For more understanding. Clip parameter with float value (e.c. = C) in RNNCell operation is clipping data (from -C to C) before activation. Alas in RNNLayer I don't find clipping before activation, that is difference in operations create big error.

morgolock commented 1 year ago

Hi Alexander,

ACL does not support the clip parameter presently: https://github.com/ARM-software/ComputeLibrary/blob/main/arm_compute/runtime/NEON/functions/NERNNLayer.h

How does this clip argument work on your side. Are the values to be clipped just before the final activation? https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/NEON/functions/NERNNLayer.cpp#L103 .

Can't you not use LU_BOUNDED_RELU for the final activation? It's doing just that: https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/kernels/activation/generic/neon/impl.h#L118

Hope this helps.

allnes commented 1 year ago

Hi Alexander,

ACL does not support the clip parameter presently: https://github.com/ARM-software/ComputeLibrary/blob/main/arm_compute/runtime/NEON/functions/NERNNLayer.h

How does this clip argument work on your side. Are the values to be clipped just before the final activation? https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/NEON/functions/NERNNLayer.cpp#L103 .

Can't you not use LU_BOUNDED_RELU for the final activation? It's doing just that: https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/kernels/activation/generic/neon/impl.h#L118

Hope this helps.

Yes, all values are clipped just before the final activation. And we use LU_BOUNDED_RELU for RELU, but alas it doesn't work for TANH and LOGISTIC activation.

allnes commented 10 months ago

As I know tflite use Compute Library with clip, could you provide how it works?

morgolock commented 9 months ago

Hi @allnes

I'm not familiar with the way tflite implements clipping in RNNLayer.

Since you have the test cases in place, please consider contributing to ACL and uploading an small patch for review implementing the changes for this small feature request.

It looks like the clipping can be done by adding a new kernel to NERNNLayer and running the new kernel before the final activation: https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/NEON/functions/NERNNLayer.cpp#L127 https://github.com/ARM-software/ComputeLibrary/blob/main/src/runtime/NEON/functions/NERNNLayer.cpp#L144

You'll need to make changes to the interface to specify the min/max values for clipping.

Hope this helps.

allnes commented 9 months ago

@morgolock thank you for answer