ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

The output of tanh_f16 overflow #998

Closed lilh9598 closed 1 year ago

lilh9598 commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.05 Build options: {'Werror': '0', 'debug': '0', 'neon': '1', 'opencl': '0', 'embed_kernels': '0', 'os': 'linux', 'arch': 'armv8a', 'build': 'native', 'multi_isa': '1'} Git hash=b'a175e887d64450decf80ea47d4049832c5805565'

Platform: Neoverse-N1

Operating System: debian

Problem description: The output overflows when using the NEActivationLayer with TANH and F16. The following program can reproduce this bug.

#include "arm_compute/runtime/NEON/NEFunctions.h"
using namespace arm_compute;

int main() {
    constexpr int64_t dims = 8;
    Tensor src_tensor, dst_tensor;
    auto shape = TensorShape(dims);
    auto data_info = TensorInfo(shape, Format::F16);
    src_tensor.allocator()->init(data_info);
    dst_tensor.allocator()->init(data_info);
    auto act_info = ActivationLayerInfo(ActivationLayerInfo::ActivationFunction::TANH, 1.f, 1.f);
    auto tanh_layer = NEActivationLayer();
    tanh_layer.configure(&src_tensor, &dst_tensor, act_info);

    __fp16 src[dims] = {4.f, 5.f, 6.f, 7.f, 8.f, 9.f, 10.f, 12.f};
    __fp16 dst[dims] = {0.f};
    src_tensor.allocator()->import_memory(src);
    dst_tensor.allocator()->import_memory(dst);
    tanh_layer.run();
    src_tensor.allocator()->free();
    dst_tensor.allocator()->free();

    for (int i = 0; i < dims; i++) {
        float dst_f32 = (float)dst[i];
        float src_f32 = (float)src[i];
        printf("index: %d src: %f ==> dst: %f\n", i, src_f32, dst_f32);
    }
}

The logs of this program on my platform is as follows:

index: 0 src: 4.000000 ==> dst: 0.999512 index: 1 src: 5.000000 ==> dst: 1.000977 index: 2 src: 6.000000 ==> dst: nan index: 3 src: 7.000000 ==> dst: nan index: 4 src: 8.000000 ==> dst: nan index: 5 src: 9.000000 ==> dst: nan index: 6 src: 10.000000 ==> dst: nan index: 7 src: 12.000000 ==> dst: nan

I did some checks and there is a bug with tanhq_f16 I think.
https://github.com/ARM-software/ComputeLibrary/blob/aabef6c0584f06f4c0f4b61fb787d80374240619/src/core/NEON/NEMath.inl#L484-L497

morgolock commented 1 year ago

Hi @lilh9598

The following patch solves the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8326

Hope this helps

lilh9598 commented 1 year ago

Hi @lilh9598

The following patch solves the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/8326

Hope this helps

Thanks, I get the right result in my test with your code.