NEMeanStdDevNormalizationLayer returns nans for f16 tensors #2

NEMeanStdDevNormalizationLayer returns nans if src\dst tensors are f16.

The issue was reproduced running OpenVINO notebook sketch-to-image-pix2pix-turbo with ACL 24.06 on Apple M2. The issue happens with the fix provided in https://github.com/ARM-software/ComputeLibrary/issues/1095 since it was reproduced on ACL 24.06

Below you may find dedicated MVN reproducer.

ACL build command:

scons arch=arm64-v8.2-a neon=1 opencl=0 openmp=0 cppthreads=1 os=macos data_layout_support=all  build=native asserts=1 debug=1 --jobs=8 --silent os=macos build=native fixed_format_kernels=True validation_tests=1

Reproducer build command:

g++ -O2 -g -I./ComputeLibrary -I./ComputeLibrary/include acl_mvn_reproducer.cpp -o mvn -L./ComputeLibrary/build/ -larm_compute -std=c++17

Reproducer run command (it's better to redirect stdout to the file since tensors are big):

DYLD_LIBRARY_PATH=ComputeLibrary/build ./mvn > mvn.log

Reproducer

#include "arm_compute/core/Error.h"
#include "arm_compute/core/TensorShape.h"
#include "arm_compute/core/utils/misc/MMappedFile.h"
#include "arm_compute/runtime/Tensor.h"
#include "arm_compute/runtime/NEON/functions/NEMeanStdDevNormalizationLayer.h"
#include "tests/Utils.h"
#include "tests/NEON/Accessor.h"
#include <iostream>
#include <vector>

using namespace arm_compute;
using namespace arm_compute::test;

int main(int argc, char *argv[]) {

 utils::mmap_io::MMappedFile mmapped_file("src.bin", 0 , 0);
 if(!mmapped_file.is_mapped()) {
   std::cout << "Data file is corrupted" << std::endl;
   exit(1);
 }

 unsigned char *srcData = mmapped_file.data();
   size_t X = 1048576;
   size_t Y = 32;
   float epsValue_ = 0.00000999999974f;
  TensorInfo srcTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  TensorInfo dstTensorInfo = TensorInfo(TensorShape(X, Y), 1, DataType::F16, DataLayout::NCHW);
  auto status = NEMeanStdDevNormalizationLayer::validate(&srcTensorInfo, &dstTensorInfo, epsValue_);
  if(status.error_code() != ErrorCode::OK) {
    std::cout << "ERROR: " << status.error_description().c_str() << std::endl;
    exit(1);
  }
  std::cout << "PASSED VALIDATION" << std::endl;

  Tensor srcTensor;
  Tensor dstTensor;
  srcTensor.allocator()->init(srcTensorInfo);
  dstTensor.allocator()->init(dstTensorInfo);
  NEMeanStdDevNormalizationLayer mvn;

  mvn.configure(&srcTensor, &dstTensor, epsValue_);
  std::cout << "PASSED CONFIGURATION" << std::endl;
  srcTensor.allocator()->allocate();
  dstTensor.allocator()->allocate();
  srcTensor.allocator()->import_memory(srcData);
  srcTensor.print(std::cout);

  mvn.run();
  std::cout << "PASSED RUN" << std::endl;
  dstTensor.print(std::cout);
  srcTensor.allocator()->free();
  dstTensor.allocator()->free();
  return 0;
}

Src tensor: src.bin.zip Reproducer log: mvn.log.zip

ARM-software / ComputeLibrary

NEMeanStdDevNormalizationLayer returns nans for f16 tensors #2 #1114