ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

NEMeanStdDevNormalizationLayer returns nans for f16 tensors #2 #1114

Open alvoron opened 6 days ago

alvoron commented 6 days ago

NEMeanStdDevNormalizationLayer returns nans if src\dst tensors are f16.

The issue was reproduced running OpenVINO notebook sketch-to-image-pix2pix-turbo with ACL 24.06 on Apple M2. The issue happens with the fix provided in https://github.com/ARM-software/ComputeLibrary/issues/1095 since it was reproduced on ACL 24.06

Below you may find dedicated MVN reproducer.

ACL build command:

scons arch=arm64-v8.2-a neon=1 opencl=0 openmp=0 cppthreads=1 os=macos data_layout_support=all  build=native asserts=1 debug=1 --jobs=8 --silent os=macos build=native fixed_format_kernels=True validation_tests=1

Reproducer build command:

g++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include acl_mvn_reproducer.cpp -o mvn -L./ComputeLibrary/build/ -larm_compute -std=c++17

Reproducer run command (it's better to redirect stdout to the file since tensors are big):

DYLD_LIBRARY_PATH=ComputeLibrary/build ./mvn > mvn.log

Reproducer

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/core/utils/misc/MMappedFile.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"
#include "tests/Utils.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {

 utils::mmap_io::MMappedFile mmapped_file("src.bin", 0 , 0);
 if(!mmapped_file.is_mapped()) {
   std::cout << "Data file is corrupted" << std::endl;
   exit(1);
 }

 unsigned char *srcData = mmapped_file.data();
   size_t X = 1048576;
   size_t Y = 32;
   float epsValue_ = 0.00000999999974f;
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);
  NEMeanStdDevNormalizationLayer mvn;

  mvn.configure(&srcTensor, &dstTensor, epsValue_);
  std::cout << "PASSED CONFIGURATION" << std::endl;
  srcTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();
  srcTensor.allocator()->import_memory(srcData);
  srcTensor.print(std::cout);

  mvn.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);
  srcTensor.allocator()->free();
  dstTensor.allocator()->free();
  return 0;
}

Src tensor: src.bin.zip Reproducer log: mvn.log.zip