ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

NEMeanStdDevNormalizationLayer returns nans for f16 tensors #1095

Closed alvoron closed 1 month ago

alvoron commented 4 months ago

NEMeanStdDevNormalizationLayer returns nans if srd\dst tensors are f16. The issue was reproduced on ACL 23.08

How ACL was built: scons neon=1 opencl=0 openmp=0 cppthreads=1 arch=armv8.6-a Werror=false validation_tests=1 --jobs=8 os=macos build=native --silent fixed_format_kernels=1 asserts=1 debug=1

How reproducer was built: clang++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include mvn_bug.c -o bug -L./ComputeLibrary/build/ -L./ComputeLibrary/build/tests/ -L./ComputeLibrary/build/tests/framework/ -larm_compute -lAssetsLibrary.o -lRawTensor.o -lExceptions.o -std=c++17

Issue was reproduced on Apple M1

Reproducer:

#include "arm_compute/core/TensorShape.h"

#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"

#include "tests/Utils.h"
#include "tests/AssetsLibrary.h"
#include "tests/NEON/Accessor.h"

#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {
   size_t X = 128;
   size_t Y = 64;
   float epsValue_ = 0.00000999999974f;

  TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);

  auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }

  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);

  NEMeanStdDevNormalizationLayer mvn;
  mvn.configure(&srcTensor, &dstTensor, epsValue_);
  std::cout << "PASSED CONFIGURATION" << std::endl;

  srcTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();

  AssetsLibrary library(".", std::random_device()());
  std::uniform_real_distribution<> distribution{ -2000.0f, 3000.0f };
  library.fill(Accessor(srcTensor), distribution, 0);

  srcTensor.print(std::cout);
  mvn.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);

  srcTensor.allocator()->free();
  dstTensor.allocator()->free();

  return 0;
}
morgolock commented 3 months ago

Hi @alvoron

I managed to reproduce this, however the range of input values in your test [-2000.f,3000.f] is not supported for float16_t in the operator NEMeanStdDevNormalizationLayer.

We just test for values in the range [-1.f , 1.f] see https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/MeanStdDevNormalizationLayerFixture.h#L61

I've also modified the test to use [-1000.f, 1000.f] and I see no nans

 18  int main(int argc, char *argv[]) {
 19    size_t X = 128;
 20    size_t Y = 64;
 21    float epsValue_ = 0.00000999999974f;
 22 
 23   TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 24   TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
 25 
 26   auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
 27   if(status.error_code() != ErrorCode::OK) {
 28     std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
 29     exit(1);
 30   }
 31 
 32   std::cout << "PASSED VALIDATION" << std::endl;
 33     
 34   Tensor srcTensor;
 35   Tensor dstTensor;
 36   srcTensor.allocator()->init(srcTensorInfo);
 37   dstTensor.allocator()->init(dstTensorInfo);
 38 
 39   NEMeanStdDevNormalizationLayer mvn;
 40   mvn.configure(&srcTensor, &dstTensor, epsValue_);
 41   std::cout << "PASSED CONFIGURATION" << std::endl;
 42 
 43   srcTensor.allocator()->allocate();
 44   dstTensor.allocator()->allocate();
 45 
 46    std::uniform_real_distribution<float> distribution(-1000.0f, 1000.0f);
 47    Window window;
 48    window.use_tensor_dimensions(srcTensor.info()->tensor_shape());
 49    execute_window_loop(window,
 50                            [&](const Coordinates &id)
 51                              {
 52                                 const auto value                                  = static_cast<float16_t>(distribution(gen));
 53                                  *reinterpret_cast<float16_t *>(srcTensor.ptr_to_element(id)) = float16_t(value);
 54           });                    
 55           
 56   srcTensor.print(std::cout);
 57   mvn.run();
 58   std::cout << "PASSED RUN" << std::endl;
 59   dstTensor.print(std::cout);
 60 
 61   srcTensor.allocator()->free();
 62   dstTensor.allocator()->free();
 63 
 64   return 0;

What's the use case for the range of values [-2000.0f, 3000.0f] ? is there a model using this?

Hope this helps

alvoron commented 3 months ago

The issue is reproduced on style transfer model. I've got [-2000, 3000] range there.

I was able to reproduce the issue with the range [0, 1000]. Could you try?

morgolock commented 3 months ago

Hi @alvoron

Thank you for sharing the details. The following patch fixes the problem: https://review.mlplatform.org/c/ml/ComputeLibrary/+/11311

This fix will be included in 24.04

Hope this helps.