Samsung / ONE

On-device Neural Engine
Other
426 stars 151 forks source link

[onert-micro] plan for quantized kernel #13207

Open chunseoklee opened 2 months ago

chunseoklee commented 2 months ago

Let's make a plan(by Sep) for quantized(s8/s16) kernel.

AFAIU, the master branch supports :

All this operations are accelerated by CMSIS_NN library(https://github.com/ARM-software/CMSIS-NN/tree/v4.1.0?tab=readme-ov-file).

chunseoklee commented 2 months ago

IMHO, Let's fist support all the operations, which are supported by CMSIS_NN. That is, based on CMSIS_NN 4.1(https://github.com/ARM-software/CMSIS-NN/tree/v4.1.0?tab=readme-ov-file), final goal by Sep is that accelerating S8 10 kernels by Sep. And then, enable S8 kernel for several operations(TBD, ~10 operations) not supported by CMSIS_NN.

@BalyshevArtem Please share any opinion about this

BalyshevArtem commented 2 months ago

@BalyshevArtem Please share any opinion about this

Yes, sure. Currently we are in process with this task, thank you for detailing the task :)

chunseoklee commented 2 months ago

Then, our final goal by Sep is :

chunseoklee commented 2 months ago

gtest log on x86 about quantized kernel : quantized_test_xml_log.zip

Note: Google Test filter = *S8*:*S16*
[==========] Running 5 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from AveragePool2DTest
[ RUN      ] AveragePool2DTest.S8_P
[       OK ] AveragePool2DTest.S8_P (0 ms)
[----------] 1 test from AveragePool2DTest (0 ms total)

[----------] 2 tests from FullyConnectedTest
[ RUN      ] FullyConnectedTest.S8_P
[       OK ] FullyConnectedTest.S8_P (0 ms)
[ RUN      ] FullyConnectedTest.S16_P
[       OK ] FullyConnectedTest.S16_P (0 ms)
[----------] 2 tests from FullyConnectedTest (0 ms total)

[----------] 1 test from Conv2DTest
[ RUN      ] Conv2DTest.S8_P
[       OK ] Conv2DTest.S8_P (0 ms)
[----------] 1 test from Conv2DTest (0 ms total)

[----------] 1 test from MaxPool2DTest
[ RUN      ] MaxPool2DTest.S8_P
[       OK ] MaxPool2DTest.S8_P (0 ms)
[----------] 1 test from MaxPool2DTest (0 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 4 test suites ran. (0 ms total)
[  PASSED  ] 5 tests.
BalyshevArtem commented 2 months ago

log from our target board for testing quantized kernels

START TESTING
-----------------
[ START TEST: Conv2DTest.INT8 ]
[ TEST TIME = (20.000000) us ]
[ TEST Conv2DTest.INT8 RESULT: OK ]
-----------------
[ START TEST: FullyConnectedTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST FullyConnectedTest.S8 RESULT: OK ]
-----------------
[ START TEST: FullyConnectedTest.S16 ]
[ TEST TIME = (20.000000) us ]
[ TEST FullyConnectedTest.S16 RESULT: OK ]
-----------------
[ START TEST: AveragePool2DTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST AveragePool2DTest.S8 RESULT: OK ]
-----------------
[ START TEST: MaxPool2DTest.S8 ]
[ TEST TIME = (10.000000) us ]
[ TEST MaxPool2DTest.S8 RESULT: OK ]
-----------------
END TESTING