Samsung / ONE

On-device Neural Engine
Other
429 stars 152 forks source link

[Compiler FE] Build armv7l target on tizen gbs #8480

Open chunseoklee opened 2 years ago

chunseoklee commented 2 years ago

parent : #8379 (draft: #8407)

Let try to build and run Compiler FE on tizen obs build system !

DRAFT PR is updated on https://github.com/Samsung/ONE/pull/8481

chunseoklee commented 2 years ago

Error on https://github.com/Samsung/ONE/pull/8481/commits/bd2bb66c3fcb81f09ffd3aa1a518bfb667bc7a79

/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/compiler/mio-tflite260/CMakeLists.txt:13 (nnas_find_package)

mio needs tensorflow-2.6. Thus tf-2.6 source should be included into packaging folder.

``` [ 120s] -- !!! FlatBuffers_FOUND [ 120s] -- Download TENSORFLOW from https://github.com/tensorflow/tensorflow/archive/v2.6.0.tar.gz [ 120s] -- (Trial Count : 0) [ 120s] CMake Warning at /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/cmake/modules/ExternalSourceTools.cmake:66 (message): [ 120s] error: downloading [ 120s] 'https://github.com/tensorflow/tensorflow/archive/v2.6.0.tar.gz' failed [ 120s] [ 120s] status_code: 6 [ 120s] status_string: "Couldn't resolve host name" [ 120s] log: getaddrinfo(3) failed for github.com:443 [ 120s] [ 120s] Could not resolve host: github.com [ 120s] [ 120s] Closing connection 0 [ 120s] [ 120s] Call Stack (most recent call first): [ 120s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/cmake/packages/TensorFlowSource-2.6.0/TensorFlowSourceConfig.cmake:12 (ExternalSource_Download) [ 120s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/cmake/packages/TensorFlowSource-2.6.0/TensorFlowSourceConfig.cmake:18 (_TensorFlowSource_import) [ 120s] CMakeLists.txt:43 (find_package) [ 120s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/compiler/mio-tflite260/CMakeLists.txt:13 (nnas_find_package) [ 120s] [ 120s] ```
chunseoklee commented 2 years ago

On https://github.com/Samsung/ONE/pull/8481/commits/b38ebe9bd4c33c73e285e217c17dfdfc50a965db f6ffb10 (force-push due to ruy and eigen for 2.6)

Anyway, build itself starts on gbs(thus, at least configuration stage is done).

chunseoklee commented 2 years ago

On https://github.com/Samsung/ONE/pull/8481/commits/fd645eaf8c5b15769ca1bad7c3882c0c294c9d1b

Need to reduce build time. it takes 840s till the following :

[  840s] [100%] Generating ArgMin_U8_003.gen.tflite
[  840s] [100%] Generating AveragePool2D_000.gen.tflite
[  840s] [100%] Generating AveragePool2D_U8_000.gen.tflite
[  840s] [100%] Generating BatchMatMul_000.gen.tflite
[  840s] [100%] Generating BatchToSpaceND_000.gen.tflite
[  840s] gmake[3]: Leaving directory '/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug.host'
[  840s] [100%] Built target tflchef_testfiles
[  840s] gmake[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug.host'
[  840s] gmake[1]: Leaving directory '/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug.host'
[  840s] ROOTFS_DIR= TARGET_ARCH=armv7l \
[  840s] BUILD_HOST_EXEC=/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug.host \
[  840s] NNCC_WORKSPACE=build/arm32.debug ./nncc configure \
[  840s]        -DCMAKE_BUILD_TYPE=Debug \
[  840s]        -DBUILD_COMPILER_NNC=OFF \

And build fail :

~/d/ONE on  nncc_gbsbuild ⬢  8.10.0 2527.262s
➜  grep "error" /home/twoflower/GBS-ROOT-3.0TM1_llvm38/local/repos/tizen/armv7l/logs/fail/nnfw-1.20.0-1/log.txt
[ 2376s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/compiler/luci/partition/src/PartitionIR.cpp:67:19: error: moving a local object in a return statement prevents copy elision [-Werror=pessimizing-move]
[ 2377s] cc1plus: all warnings being treated as errors
[ 2510s] collect2: error: ld returned 1 exit status
[ 2510s] error: Bad exit status from /var/tmp/rpm-tmp.0jVvWh (%build)
[ 2510s] RPM build errors:
chunseoklee commented 2 years ago

on 5ec7b6a,

fail at 1923s in

[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:30: undefined reference to `tflite::tensor_utils::NeonMatrixBatchVe
ctorMultiplyAccumulate(float const*, int, int, float const*, int, float*)'
[ 1923s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: ../kernels/libluci_interpreter_linux_pal.a(tensor_utils.cc.o): in function `tflite::tensor_utils::Matrix
BatchVectorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, float*)':
[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:40: undefined reference to `tflite::tensor_utils::NeonMatrixBatchVe
ctorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, float*)'
[ 1923s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: ../kernels/libluci_interpreter_linux_pal.a(tensor_utils.cc.o): in function `tflite::tensor_utils::Matrix
BatchVectorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, int*, float*, tflite::CpuBackendContext*)':
[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:51: undefined reference to `tflite::tensor_utils::NeonMatrixBatchVe
ctorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, int*, float*, tflite::CpuBackendContext*)'
[ 1923s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: ../kernels/libluci_interpreter_linux_pal.a(tensor_utils.cc.o): in function `tflite::tensor_utils::Matrix
BatchVectorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, float*, float const*, int const*, int*, int*, bool*, tflite::CpuBackendContext*)':
[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:61: undefined reference to `tflite::tensor_utils::NeonMatrixBatchVe
ctorMultiplyAccumulate(signed char const*, int, int, signed char const*, float const*, int, float*, float const*, int const*, int*, int*, bool*, tflite::CpuBackendContext*)'
[ 1923s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: ../kernels/libluci_interpreter_linux_pal.a(tensor_utils.cc.o): in function `tflite::tensor_utils::SparseMatrixBatchVectorMultiplyAccumulate1x4(float const*, int const*, int const*, int, int, float const*, int, float*)':
[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:70: undefined reference to `tflite::tensor_utils::NeonSparseMatrixBatchVectorMultiplyAccumulate1x4(float const*, int const*, int const*, int, int, float const*, int, float*)'
[ 1923s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: ../kernels/libluci_interpreter_linux_pal.a(tensor_utils.cc.o): in function `tflite::tensor_utils::SparseMatrixBatchVectorMultiplyAccumulate(float const*, unsigned char const*, int, int, float const*, int, float*)':
[ 1923s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neon_tensor_utils.h:78: undefined reference to `tflite::tensor_utils::NeonSparseMatrixBatchVectorMultiplyAccumulate(float const*, unsigned char const*, int, int, float const*, int, float*)'
seanshpark commented 2 years ago

fail at 1923s in

We may need to revise luci-interprete backend...

chunseoklee commented 2 years ago

On fe0f477,

 1725s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.h:601: undefined reference to `ruy::Pack8bitRowMajorForNeon(unsigned char const*, int, int, int, int, int, int, s
igned char*, int, int, int*, int, int)'
[ 1725s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: libluci_interpreter_linux_pal.a(neon_tensor_utils.cc.o): in function `ruy::PackImpl<(ruy::Path)16, ruy::
FixedKernelLayout<(ruy::Order)0, 16, 4>, signed char, signed char, int, (ruy::Order)0>::Run(ruy::Tuning, ruy::Mat<signed char> const&, ruy::PMat<signed char>*, int, int)':
[ 1725s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.h:213: undefined reference to `ruy::Pack8bitColMajorForNeon4Cols(ruy::PackParams8bit const&)'
[ 1725s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: libluci_interpreter_linux_pal.a(neon_tensor_utils.cc.o): in function `ruy::PackImpl<(ruy::Path)16, ruy::
FixedKernelLayout<(ruy::Order)0, 16, 4>, signed char, signed char, int, (ruy::Order)1>::Run(ruy::Tuning, ruy::Mat<signed char> const&, ruy::PMat<signed char>*, int, int)':
[ 1725s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.h:601: undefined reference to `ruy::Pack8bitRowMajorForNeon(unsigned char const*, int, int, int, int, int, int, s
igned char*, int, int, int*, int, int)'
[ 1725s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: libluci_interpreter_linux_pal.a(neon_tensor_utils.cc.o): in function `ruy::Kernel<(ruy::Path)16, signed
char, signed char, int, int>::Run(ruy::PMat<signed char> const&, ruy::PMat<signed char> const&, ruy::MulParams<int, int> const&, int, int, int, int, ruy::Mat<int>*) const':
[ 1725s] /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0-RUY/ruy/kernel_arm.h:101: undefined reference to `ruy::Kernel8bitNeon(ruy::KernelParams8bit<4, 2> const&)'
[ 1725s] /usr/lib/gcc/armv7l-tizen-linux-gnueabi/9.2.0/../../../../armv7l-tizen-linux-gnueabi/bin/ld: /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0-RUY/ruy/kernel_arm.h:98: undefined re
ference to `ruy::Kernel8bitNeon1Col(ruy::KernelParams8bit<4, 2> const&)'
[ 1725s] collect2: error: ld returned 1 exit status
[ 1725s] compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_kernels_test.dir/build.make:1272: recipe for target 'compiler/luci-interpreter/src/kernels/luci_interpreter_kernels_test' failed
[ 1725s] gmake[3]: *** [compiler/luci-interpreter/src/kernels/luci_interpreter_kernels_test] Error 1
[ 1725s] gmake[3]: Leaving directory '/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug'
[ 1725s] CMakeFiles/Makefile2:4941: recipe for target 'compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_kernels_test.dir/all' failed
[ 1725s] gmake[2]: *** [compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_kernels_test.dir/all] Error 2
[ 1725s] gmake[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug'
[ 1725s] Makefile:145: recipe for target 'all' failed
chunseoklee commented 2 years ago

Need to reduce build time. it takes 840s till the following :

Unfortunately, tizen build system for gcc already accelerated !!!

➜  grep accel /home/twoflower/GBS-ROOT-3.0TM1_llvm38/local/BUILD-ROOTS/scratch.armv7hl.0/tizen.conf
Preinstall: qemu-accel-%{build_hostarch}-armv7l
Runscripts: qemu-accel-%{build_hostarch}-armv7l
Preinstall: qemu-accel-%{build_hostarch}-armv7hl
Runscripts: qemu-accel-%{build_hostarch}-armv7hl
Preinstall: qemu-accel-%{build_hostarch}-aarch64
Runscripts: qemu-accel-%{build_hostarch}-aarch64
Substitute: python-accel-armv7l-cross-arm python-accel-%{build_hostarch}-armv7l
Substitute: python-accel-armv7hl-cross-arm python-accel-%{build_hostarch}-armv7hl
Substitute: python-accel-aarch64-cross-aarch64 python-accel-%{build_hostarch}-aarch64
Substitute: clang-accel-armv7l-cross-arm clang-accel-%{build_hostarch}-armv7l
Substitute: clang-accel-armv7hl-cross-arm clang-accel-%{build_hostarch}-armv7hl
Substitute: clang-accel-aarch64-cross-aarch64 clang-accel-%{build_hostarch}-aarch64
chunseoklee commented 2 years ago
  {
    "directory": "/home/twoflower/dev/ONE/build/arm32.debug/compiler/luci-interpreter/src",
    "command": "/home/twoflower/dev/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-g++
    --sysroot=/home/twoflower/dev/rootfs/arm-ubuntu/arm  -Dluci_interpreter_EXPORTS
    -I/home/twoflower/dev/ONE/compiler/luci-interpreter/include
    -I/home/twoflower/dev/ONE/compiler/luci-interpreter/src -I/home/twoflower/dev/ONE/compiler/luci/lang/include
    -I/home/twoflower/dev/ONE/compiler/loco/include -I/home/twoflower/dev/ONE/compiler/angkor/include
    -I/home/twoflower/dev/ONE/compiler/oops/include -I/home/twoflower/dev/ONE/compiler/pepper-str/include
    -I/home/twoflower/dev/ONE/compiler/luci-interpreter/pal/linux  -g -fPIC   -Werror -Wall -Wextra -Wno-reorder
    -std=gnu++14 -o CMakeFiles/luci_interpreter.dir/BuddyMemoryManager.cpp.o -c
    /home/twoflower/dev/ONE/compiler/luci-interpreter/src/BuddyMemoryManager.cpp",
    "file": "/home/twoflower/dev/ONE/compiler/luci-interpreter/src/BuddyMemoryManager.cpp"
  },
  {
    "directory": "/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/build/arm32.debug/compiler/luci-interpreter/src/kernels",
    "command": "/bin/c++  -isystem
    /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/nncc/../../externals/TENSORFLOW-2.6.0-RUY -isystem
    /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/nncc/../../externals/TENSORFLOW-2.6.0-GEMMLOWP -isystem
    /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/nncc/../../externals/TENSORFLOW-2.6.0-EIGEN -isystem
    /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/infra/nncc/../../externals/TENSORFLOW-2.6.0 -O2 -g2 -pipe -Wall
    -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -Wformat-security -fmessage-length=0
    -frecord-gcc-switches -Wl,-z,relro,--as-needed --param=ssp-buffer-size=4 -march=armv7-a -mtune=cortex-a8
    -mlittle-endian -mfpu=neon -mfloat-abi=softfp -mthumb -Wp,-D__SOFTFP__ -Wl,-O1 -Wl,--hash-style=gnu
    -Wa,-mimplicit-it=thumb -g -g -fPIC -std=gnu++14 -o
    CMakeFiles/luci_interpreter_linux_pal.dir/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tenso\
    rflow/lite/kernels/internal/optimized/neon_tensor_utils.cc.o -c
    /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/neo\
    n_tensor_utils.cc",
    "file":
    "/home/abuild/rpmbuild/BUILD/nnfw-1.20.0/externals/TENSORFLOW-2.6.0/tensorflow/lite/kernels/internal/optimized/ne\
    on_tensor_utils.cc"
  },
  {
chunseoklee commented 2 years ago

now, build is fine ob https://github.com/Samsung/ONE/pull/8481/commits/0be8a492a40c5ae92ac3d2a0f3edf708b67163c7. 1 test failed :

[ 1497s] [       OK ] QuantizedModelVerifierTest.LocalCreateConst (0 ms)
[ 1497s] [ RUN      ] QuantizedModelVerifierTest.InstanceNorm
[ 1497s] QuantizeWithMinMaxPass Start
[ 1497s] QuantizeActivation visit node: input_0
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node: output_0
[ 1497s] ==============================================================
[ 1497s] PhaseRunner<Saturate>
[ 1497s] Initial graph
[ 1497s] luci_pass_test: /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/compiler/luci/logex/src/FormattedGraph.cpp:1160: bool {anonymous}::summary_node(const locop::
SymbolTable*, const luci::CircleInstanceNorm*, locop::NodeSummary&): Assertion `fused != luci::FusedActFunc::UNDEFINED' failed.
[ 1497s]
[ 1497s]       Start 29: luci_profile_test
[ 1497s] 29/42 Test #29: luci_profile_test ................   Passed    0.12 sec
[ 1497s]       Start 30: luci_plan_test
[ 1497s] 30/42 Test #30: luci_plan_test ...................   Passed    0.18 sec
[ 1497s]       Start 31: luci_partition_test
[ 1497s] 31/42 Test #31: luci_partition_test ..............   Passed    0.27 sec
chunseoklee commented 2 years ago

Now, all the tests passed on https://github.com/Samsung/ONE/pull/8481/commits/4b12f67c73e3ebd1633da4102da99c73469aa294. But, I am not sure that it is resolved since the commit avoids failed asserts by release build.

seanshpark commented 2 years ago
[ 1497s] [ RUN      ] QuantizedModelVerifierTest.InstanceNorm
[ 1497s] QuantizeWithMinMaxPass Start
[ 1497s] QuantizeActivation visit node: input_0
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node:
[ 1497s] QuantizeActivation visit node: output_0
[ 1497s] ==============================================================
[ 1497s] PhaseRunner<Saturate>
[ 1497s] Initial graph
[ 1497s] luci_pass_test: /home/abuild/rpmbuild/BUILD/nnfw-1.20.0/compiler/luci/logex/src/FormattedGraph.cpp:1160: bool {anonymous}::summary_node(const locop::
SymbolTable*, const luci::CircleInstanceNorm*, locop::NodeSummary&): Assertion `fused != luci::FusedActFunc::UNDEFINED' failed.
[ 1497s]

I'll check this for x86_64 build.

--> how to reproduce

cd build/debug/compiler/luci/pass
LUCI_LOG=100 ./luci_pass_test