Open ragmani opened 2 years ago
python
3.10https://github.com/Samsung/ONE/pull/9429#issuecomment-1184193971 This problem is not a trouble issue. This problem was limited to an individual environment.
cmake
policy change when using find_package(Boost ...)
error message
CMake Error at /usr/lib/x86_64-linux-gnu/cmake/Boost-1.74.0/BoostConfig.cmake:240 (if):
if given arguments:
"ALL" "IN_LIST" "Boost_FIND_COMPONENTS"
Unknown arguments specified
Call Stack (most recent call first):
CMakeLists.txt:43 (find_package)
/home/jang/git/ragmani/ONE/compiler/nnc/backends/soft_backend/CMakeLists.txt:1 (nnas_find_package)
$ cmake --help-policy CMP0057
CMP0057
-------
.. versionadded:: 3.3
Support new if()
IN_LIST operator.
CMake 3.3 adds support for the new IN_LIST operator.
The OLD
behavior for this policy is to ignore the IN_LIST operator.
The NEW
behavior is to interpret the IN_LIST operator.
This policy was introduced in CMake version 3.3.
CMake version 3.22.1 warns when the policy is not set and uses
OLD
behavior. Use the cmake_policy()
command to set
it to OLD
or NEW
explicitly.
.. note::
The OLD
behavior of a policy is
deprecated by definition
and may be removed in a future version of CMake.
- solution
Add `cmake_policy(SET CMP0057 NEW)` in `macro(nnas_find_package PREFIX)`
But it requires `cmake_minimum_required(VERSION 3.3)`
But it requires
cmake_minimum_required(VERSION 3.3)
IMO, it's better to update cmake minimum requirement version because cmake 3.1 is old version (Dec 2014: https://cmake.org/pipermail/cmake/2014-December/059418.html).
cmake version
3.5.1
3.1
3.5.1
3.10.2
3.16.3
3.22.1
3.9.4
3.16.4
3.21.3
ERROR: Could not find a version that satisfies the requirement tensorflow-cpu==2.6.0 (from versions: 2.8.0, 2.8.1, 2.8.2, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1)
ERROR: No matching distribution found for tensorflow-cpu==2.6.0
I heard from @seanshpark that using tensorflow-cpu 2.6.0 will be removed soon. So, let's wait for it to be removed.
using tensorflow-cpu 2.6.0 will be removed soon
--> #9433 , #9435
ONE/externals/ABSEIL/absl/synchronization/internal/graphcycles.cc:451:26: error: 'numeric_limits' is not a member of 'std'
451 | if (x->version == std::numeric_limits<uint32_t>::max()) {
ONE/externals/ABSEIL/absl/debugging/failure_signal_handler.cc:138:32: error: no matching function for call to 'max(long int, int)'
138 | size_t stack_size = (std::max(SIGSTKSZ, 65536) + page_mask) & ~page_mask;
I found out an error that some onecc
modules could not be found when cross-buliding onecc
on my machine.
The patch below solves this error.
@@ -20,38 +20,38 @@ ARM32_INSTALL_FOLDER=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER
ARM32_INSTALL_HOST=$(CURRENT_DIR)/$(BUILDFOLDER)/$(ARM32_FOLDER).$(TYPE_FOLDER).host.install
# ARM32 build
-ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew
-ARM32_BUILD_ITEMS+=;oops;pepper-assert
-ARM32_BUILD_ITEMS+=;hermes;hermes-std
-ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo
-ARM32_BUILD_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_BUILD_ITEMS+=;dio-hdf5
-ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_BUILD_ITEMS+=;luci
-ARM32_BUILD_ITEMS+=;luci-interpreter
-ARM32_BUILD_ITEMS+=;tflite2circle
-ARM32_BUILD_ITEMS+=;tflchef;circlechef
-ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer
-ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_BUILD_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_BUILD_ITEMS+=;pepper-csv2vec;crew;
+ARM32_BUILD_ITEMS+=;oops;pepper-assert;
+ARM32_BUILD_ITEMS+=;hermes;hermes-std;
+ARM32_BUILD_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_BUILD_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_BUILD_ITEMS+=;dio-hdf5;
+ARM32_BUILD_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_BUILD_ITEMS+=;luci;
+ARM32_BUILD_ITEMS+=;luci-interpreter;
+ARM32_BUILD_ITEMS+=;tflite2circle;
+ARM32_BUILD_ITEMS+=;tflchef;circlechef;
+ARM32_BUILD_ITEMS+=;circle2circle;record-minmax;circle-quantizer;
+ARM32_BUILD_ITEMS+=;luci-eval-driver;luci-value-test;
ARM32_TOOLCHAIN_FILE=cmake/buildtool/cross/toolchain_armv7l-linux.cmake
-ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp
-ARM32_HOST_ITEMS+=;pepper-csv2vec
-ARM32_HOST_ITEMS+=;oops
-ARM32_HOST_ITEMS+=;hermes;hermes-std
-ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo
-ARM32_HOST_ITEMS+=;safemain;mio-circle04;mio-tflite280
-ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone
-ARM32_HOST_ITEMS+=;luci
-ARM32_HOST_ITEMS+=;luci-interpreter
-ARM32_HOST_ITEMS+=;tflite2circle
-ARM32_HOST_ITEMS+=;tflchef;circlechef
-ARM32_HOST_ITEMS+=;circle-tensordump
-ARM32_HOST_ITEMS+=;circle2circle
-ARM32_HOST_ITEMS+=;common-artifacts
-ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test
+ARM32_HOST_ITEMS:=angkor;cwrap;pepper-str;pepper-strcast;pp;
+ARM32_HOST_ITEMS+=;pepper-csv2vec;
+ARM32_HOST_ITEMS+=;oops;
+ARM32_HOST_ITEMS+=;hermes;hermes-std;
+ARM32_HOST_ITEMS+=;loco;locop;logo-core;logo;
+ARM32_HOST_ITEMS+=;safemain;mio-tflite280;mio-circle04;
+ARM32_HOST_ITEMS+=;foder;circle-verify;souschef;arser;vconone;
+ARM32_HOST_ITEMS+=;luci;
+ARM32_HOST_ITEMS+=;luci-interpreter;
+ARM32_HOST_ITEMS+=;tflite2circle;
+ARM32_HOST_ITEMS+=;tflchef;circlechef;
+ARM32_HOST_ITEMS+=;circle-tensordump;
+ARM32_HOST_ITEMS+=;circle2circle;
+ARM32_HOST_ITEMS+=;common-artifacts;
+ARM32_HOST_ITEMS+=;luci-eval-driver;luci-value-test;
_SPACE_:=
But I'm not sure if this way is correct.
I found an error when cross-building. It's hard for me to solve it.
[ 93%] Building CXX object compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
264 | asm volatile(
| ^~~
gmake[3]: *** [compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/build.make:258: compiler/luci-interpreter/src/kernels/CMakeFiles/luci_interpreter_linux_pal.dir/home/jang/git/ragmani/ONE/externals/TENSORFLOW-2.6.0-RUY/ruy/pack_arm.cc.o] Error 1
source code
// No attempt made at making this code efficient on A55-ish cores yet.
void Pack8bitColMajorForNeon4Cols(const PackParams8bit& params) {
CheckOffsetsInPackParams8bit(params);
profiler::ScopeLabel label("Pack (kNeon)");
const void* src_ptr0 = params.src_ptr0;
const void* src_ptr1 = params.src_ptr1;
const void* src_ptr2 = params.src_ptr2;
const void* src_ptr3 = params.src_ptr3;
const int src_inc0 = params.src_inc0;
const int src_inc1 = params.src_inc1;
const int src_inc2 = params.src_inc2;
const int src_inc3 = params.src_inc3;
const std::int8_t* packed_ptr = params.packed_ptr;
asm volatile( <---------- line 264
// clang-format off
"ldr r2, [%[params], #" RUY_STR(RUY_OFFSET_INPUT_XOR) "]\n"
"vdup.8 q11, r2\n"
"mov r1, #0\n"
// Zero-out the accumulators
"vmov.i32 q12, #0\n"
"vmov.i32 q13, #0\n"
"vmov.i32 q14, #0\n"
"vmov.i32 q15, #0\n"
// Round down src_rows to nearest multiple of 16.
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
"and r2, r3, #-16\n"
"cmp r1, r2\n"
"beq 3f\n"
"1:\n"
"add r1, r1, #16\n"
/* Load q0 */
"vld1.8 {d0, d1}, [%[src_ptr0]]\n"
"add %[src_ptr0], %[src_ptr0], %[src_inc0]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr0]]\n")
/* Load q1 */
"vld1.8 {d2, d3}, [%[src_ptr1]]\n"
"add %[src_ptr1], %[src_ptr1], %[src_inc1]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr1]]\n")
"veor.8 q4, q0, q11\n"
"veor.8 q5, q1, q11\n"
// Pairwise add in to 16b accumulators.
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add in to 16b accumulators.
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add accumulate into 32b accumulators.
// q12 and q13 contain 4x32b accumulators
"vpadal.s16 q12, q8\n"
"vpadal.s16 q13, q9\n"
// Now do the same for src_ptr2 and src_ptr3.
"vld1.8 {d0, d1}, [%[src_ptr2]]\n"
"add %[src_ptr2], %[src_ptr2], %[src_inc2]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr2]]\n")
"vld1.8 {d2, d3}, [%[src_ptr3]]\n"
"add %[src_ptr3], %[src_ptr3], %[src_inc3]\n"
RUY_PREFETCH_LOAD("pld [%[src_ptr3]]\n")
"veor.8 q4, q0, q11\n"
"veor.8 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Pairwise add accumulate into 32b accumulators.
// q14 and q15 contain 4x32b accumulators
"vpadal.s16 q14, q8\n"
"vpadal.s16 q15, q9\n"
"cmp r1, r2\n"
"bne 1b\n"
"3:\n"
// Now pack the last (num_rows % 16) rows.
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ROWS) "]\n"
"ands r2, r3, #15\n"
"beq 4f\n"
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SRC_ZERO_POINT) "]\n"
"vdup.8 q0, r3\n"
"vdup.8 q1, r3\n"
// First, read/accumulate/write for src_ptr0 and src_ptr1.
"cmp r2, #" #I "\n" \ "beq 5f\n" \ "vld1.8 { d0[" #R "]}, [%[src_ptr0]]!\n" \ "vld1.8 { d2[" #R "]}, [%[src_ptr1]]!\n" \
RUY_LOAD_ONE_ROW1(0, 0)
RUY_LOAD_ONE_ROW1(1, 1)
RUY_LOAD_ONE_ROW1(2, 2)
RUY_LOAD_ONE_ROW1(3, 3)
RUY_LOAD_ONE_ROW1(4, 4)
RUY_LOAD_ONE_ROW1(5, 5)
RUY_LOAD_ONE_ROW1(6, 6)
RUY_LOAD_ONE_ROW1(7, 7)
"cmp r2, #" #I "\n" \ "beq 5f\n" \ "vld1.8 { d1[" #R "]}, [%[src_ptr0]]!\n" \ "vld1.8 { d3[" #R "]}, [%[src_ptr1]]!\n" \
RUY_LOAD_ONE_ROW2(8, 0)
RUY_LOAD_ONE_ROW2(9, 1)
RUY_LOAD_ONE_ROW2(10, 2)
RUY_LOAD_ONE_ROW2(11, 3)
RUY_LOAD_ONE_ROW2(12, 4)
RUY_LOAD_ONE_ROW2(13, 5)
RUY_LOAD_ONE_ROW2(14, 6)
RUY_LOAD_ONE_ROW2(15, 7)
"5:\n"
"veor.16 q4, q0, q11\n"
"veor.16 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
// Pairwise add accumulate to 4x32b accumulators.
"vpadal.s16 q12, q8\n"
"vpadal.s16 q13, q9\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
// Reset to src_zero for src_ptr2 and src_ptr3.
"vdup.8 q0, r3\n"
"vdup.8 q1, r3\n"
// Next, read/accumulate/write for src_ptr2 and src_ptr3.
"cmp r2, #" #I "\n" \ "beq 5f\n" \ "vld1.8 { d0[" #R "]}, [%[src_ptr2]]!\n" \ "vld1.8 { d2[" #R "]}, [%[src_ptr3]]!\n" \
RUY_LOAD_ONE_ROW1(0, 0)
RUY_LOAD_ONE_ROW1(1, 1)
RUY_LOAD_ONE_ROW1(2, 2)
RUY_LOAD_ONE_ROW1(3, 3)
RUY_LOAD_ONE_ROW1(4, 4)
RUY_LOAD_ONE_ROW1(5, 5)
RUY_LOAD_ONE_ROW1(6, 6)
RUY_LOAD_ONE_ROW1(7, 7)
"cmp r2, #" #I "\n" \ "beq 5f\n" \ "vld1.8 { d1[" #R "]}, [%[src_ptr2]]!\n" \ "vld1.8 { d3[" #R "]}, [%[src_ptr3]]!\n" \
RUY_LOAD_ONE_ROW2(8, 0)
RUY_LOAD_ONE_ROW2(9, 1)
RUY_LOAD_ONE_ROW2(10, 2)
RUY_LOAD_ONE_ROW2(11, 3)
RUY_LOAD_ONE_ROW2(12, 4)
RUY_LOAD_ONE_ROW2(13, 5)
RUY_LOAD_ONE_ROW2(14, 6)
RUY_LOAD_ONE_ROW2(15, 7)
"5:\n"
"veor.16 q4, q0, q11\n"
"veor.16 q5, q1, q11\n"
"vpaddl.s8 q8, q4\n"
"vpaddl.s8 q9, q5\n"
// Pairwise add accumulate to 4x32b accumulators.
"vpadal.s16 q14, q8\n"
"vpadal.s16 q15, q9\n"
"vst1.32 {q4}, [%[packed_ptr]]!\n"
"vst1.32 {q5}, [%[packed_ptr]]!\n"
"4:\n"
// Pairwise add 32-bit accumulators
"vpadd.i32 d24, d24, d25\n"
"vpadd.i32 d26, d26, d27\n"
"vpadd.i32 d28, d28, d29\n"
"vpadd.i32 d30, d30, d31\n"
// Final 32-bit values per row
"vpadd.i32 d25, d24, d26\n"
"vpadd.i32 d27, d28, d30\n"
"ldr r3, [%[params], #" RUY_STR(RUY_OFFSET_SUMS_PTR) "]\n"
"cmp r3, #0\n"
"beq 6f\n"
"vst1.32 {d25}, [r3]!\n"
"vst1.32 {d27}, [r3]!\n"
"6:\n"
// clang-format on
: [ src_ptr0 ] "+r"(src_ptr0), [ src_ptr1 ] "+r"(src_ptr1),
[ src_ptr2 ] "+r"(src_ptr2), [ src_ptr3 ] "+r"(src_ptr3)
: [ src_inc0 ] "r"(src_inc0), [ src_inc1 ] "r"(src_inc1),
[ src_inc2 ] "r"(src_inc2), [ src_inc3 ] "r"(src_inc3),
[ packed_ptr ] "r"(packed_ptr), [ params ] "r"(¶ms)
: "cc", "memory", "r1", "r2", "r3", "q0", "q1", "q2", "q3",
"q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13");
}
But I'm not sure if this way is correct.
@ragmani , I can't distinghish by the diff, what module has changed?
@ragmani , I can't distinghish by the diff, what module has changed?
I haven't changed any modules. I just added ;
at the end of each line.
@ragmani , I can't distinghish by the diff, what module has changed?
I haven't changed any modules. I just added
;
at the end of each line.
FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.
FYI, AFAIR, when trying to enable tizen build, the target without ';' is not added into build target.
Sorry. I didn't understand what you said. What are "target" and "build target" you mentioned?
@jinevening You can find discussion about cmake version under https://github.com/Samsung/ONE/issues/9432#issuecomment-1184412692
What
Let's let ONE compiler support ubuntu 22.04.
Why
Ubuntu 22.04 has started to be release. The number of users using ubuntu 22.04 will gradually increase. So, let's prepare to support it in advance! It may a little bit early, but there is nothing wrong with preparing in advance.
Environment of ubuntu 22.04
default version
To do
Build Target Architectures
Build for x86_64
Build for arm32