Closed neoamos closed 3 years ago
try running fusions -a expression_matcher. Be aware that this feature is still in development and will be updated in the next nntool version. There are issues with it in the current released version.
I'm not sure exactly what you mean by that. Is it an option for the tflite converter or nntool? I notice that the add operator is actually only present if the model is trained and not present if its not trained. I previously said it was not there with larger input, but thats because I wasn't training the model when the input was large. Maybe the converter optimizes it out if its zeroes. Here is the tflite file also: model.tflite.gz
If you have modified one of our sample projects you will find the nntool script in the model directory. Execute all the commands manually after having opened the graph. Add a fusions -a expression_matcher before saving the state or generating. The add mul add should be sucked into a single operation that will have a kernel compiled for it.
Oh I understand now, I'm using an example from the AI deck example repo and it has a nntool script with 'fusions --scale8'. What is the expression_matcher supposed to be? I can't find any documentation about it. I tried 'fusions -a scale8_match_group' and it outputs the same result as before.
If you open nntool and type help or help fusions you will get an explanation about that. fusions -l lists all available fusions. fusions -a expression_matcher attempts to fuse piecewise operations into a single kernel. It should be run after the existing script before quantization if you are not importing a quantized graph.
When you said to do 'fusions -a expression_matcher' I though expression_matcher was just a placeholder, but I noticed now its an operation you can do. When I did that, It fuses the add and multiply into one operation. It produces this graph:
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| Step | Step name | Operation | Input Dims | Output Dims | Inputs | Active | Params | Ops | Params | Hints |
| | | | (hxwxc) | (hxwxc) | | size | size | | | |
+======+===========================+=========================+============+=============+========+========+========+=========+==========================+==========================+
| 0 | input_1 | input | 28x28x1 | 28x28x1 | | 784 | 0 | | I 28x28x1 FIXED_ORDER=0 | in: hxwxc out: none |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 1 | input_1_formatter | image_format | 28x28x1 | 28x28x1 | 0/0 | 1568 | 0 | | FORMAT_CHANGE Fmt: BW8 | in: none out: none |
| | | | | | | | | | Norm: OFFSET_INT8 | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 2 | DEPTHWISE_CONV_2D_0_0_r_c | reshape | 28x28x1 | 1x28x28 | 1/0 | 1568 | 0 | | SHAPE 1x28x28 | in: none out: none |
| | hw | | | | | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 5 | DEPTHWISE_CONV_2D_0_0_fus | conv_fusion_conv_pool | 1x28x28 | 32x6x6 | 2/0 | 2768 | 832 | 167.17K | F 32x1x5x5 S 2x2 D 1x1 G | in: hxwxc,out_cxin_cxhxw |
| | ion | | 32x1x5x5 | | 3/0 | | | | 1 M 1 P 1x2x1x2 zero, T | ,out_c out: cxhxw |
| | | | 32 | | 4/0 | | | | max F 3x3 S 2x2 P | |
| | | | | | | | | | 0x0x0x0 zero | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 8 | CONV_2D_0_2_fusion | conv_fusion_conv_active | 32x6x6 | 32x3x3 | 5/0 | 10688 | 9248 | 82.94K | F 32x32x3x3 S 2x2 D 1x1 | in: cxhxw,out_cxin_cxhxw |
| | | | 32x32x3x3 | | 6/0 | | | | G 1 M 1 P 0x1x0x1 zero, | ,out_c out: cxhxw |
| | | | 32 | | 7/0 | | | | Activation relu | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 11 | CONV_2D_0_3 | conv2d | 32x6x6 | 32x3x3 | 5/0 | 2784 | 1056 | 9.22K | F 32x32x1x1 S 2x2 D 1x1 | in: cxhxw,out_cxin_cxhxw |
| | | | 32x32x1x1 | | 9/0 | | | | G 1 M 1 P 0x0x0x0 zero | ,out_c out: cxhxw |
| | | | 32 | | 10/0 | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 18 | CONV_2D_0_4 | conv2d | 32x3x3 | 32x3x3 | 8/0 | 13066 | 9248 | 82.94K | F 32x32x3x3 S 1x1 D 1x1 | in: cxhxw,out_cxin_cxhxw |
| | | | 32x32x3x3 | | 12/0 | | | | G 1 M 1 P 1x1x1x1 zero | ,out_c out: cxhxw |
| | | | 32 | | 17/0 | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 19 | expr_0 | expression | 32x1x1 | 32x3x3 | 14/0 | 3818 | 0 | | add: 2, mul: 1 | in: none out: none |
| | | | 32x1x1 | | 13/0 | | | | | |
| | | | 32x3x3 | | 11/0 | | | | | |
| | | | 32x3x3 | | 18/0 | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 20 | FULLY_CONNECTED_0_8 | linear | 32x3x3 | 10 | 19/0 | 3188 | 2890 | 2.88K | F 10x32x3x3 | in: |
| | | | 10x288 | | 15/0 | | | | | cx0x1,out_cxin_c,out_c |
| | | | 10 | | 16/0 | | | | | out: c |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 21 | SOFTMAX_0_9 | softmax | 10 | 10 | 20/0 | 20 | 0 | 20 | Beta 0.0 Axis 0 | in: none out: none |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| 22 | output_1 | output | 10 | 10 | 21/0 | 10 | 0 | | O 10 FIXED_ORDER=0 | in: none out: none |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| | Totals (#) | | | | | 13066 | 46612 | 345.17K | | |
| | Max active/Total params | | | | | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
| | Totals (#) | | | | | | 59678 | 345.17K | | |
| | Max mem usage | | | | | | | | | |
+------+---------------------------+-------------------------+------------+-------------+--------+--------+--------+---------+--------------------------+--------------------------+
However, it then throws an error when compiling the autotiler model:
BUILD_MODEL_SQ8BIT/modelModel.c:4:25: warning: implicit declaration of function 'gap_ncore' [-Wimplicit-function-declaration]
static int ActiveCore = gap_ncore();
^~~~~~~~~
BUILD_MODEL_SQ8BIT/modelModel.c:4:25: error: initializer element is not constant
BUILD_MODEL_SQ8BIT/modelModel.c: In function 'ChunkSize':
BUILD_MODEL_SQ8BIT/modelModel.c:14:13: warning: implicit declaration of function 'gap_fl1' [-Wimplicit-function-declaration]
Log2Core = gap_fl1(NCore);
^~~~~~~
BUILD_MODEL_SQ8BIT/modelModel.c: At top level:
BUILD_MODEL_SQ8BIT/modelModel.c:25:17: error: unknown type name 's19_kernel_args_t'
void s19_kernel(s19_kernel_args_t *Args) {
^~~~~~~~~~~~~~~~~
model_rules.mk:78: recipe for target 'BUILD_MODEL_SQ8BIT/GenTile' failed
The autotiler model doesn't seem to have generated correctly
#include "modelModel.c"
static int CoreCountDynamic = 1;
static int ActiveCore = gap_ncore();
static inline unsigned int __attribute__((always_inline)) ChunkSize(unsigned int X)
{
unsigned int NCore;
unsigned int Log2Core;
unsigned int Chunk;
if (CoreCountDynamic) NCore = ActiveCore; else NCore = gap_ncore();
Log2Core = gap_fl1(NCore);
Chunk = (X>>Log2Core) + ((X&(NCore-1))!=0);
return Chunk;
}
#ifndef AT_NORM
#define AT_NORM(x, n) gap_roundnorm_reg((x), (n))
#endif
#define ATLShift(x, n) ((x) << (n))
// Output iteration space reduced to 2 iteration spaces
void s19_kernel(s19_kernel_args_t *Args) {
unsigned int H = Args->H;
unsigned int W = Args->W;
signed char * expr_0_in_3 = Args->expr_0_in_3;
signed char * expr_0_in_2 = Args->expr_0_in_2;
signed char * expr_0_in_0 = Args->expr_0_in_0;
signed char * expr_0_in_1 = Args->expr_0_in_1;
signed char * expr_0_out_0 = Args->expr_0_out_0;
unsigned int CoreId = gap_coreid();
unsigned int Chunk = ChunkSize(H);
unsigned int First = Chunk*CoreId;
unsigned int Last = gap_min(First+Chunk, H);
for (int d0=First; d0<Last; d0++) {
for (int d1_d2=0; d1_d2<W; d1_d2++) {
expr_0_out_0[d0*9+d1_d2*1] = ((signed char)gap_clip((gap_roundnorm_reg(((gap_roundnorm_reg((gap_roundnorm_reg((gap_roundnorm_reg(((gap_roundnorm_reg((((int)expr_0_in_1[d0*9+d1_d2*1])*25642), 16)+((int)expr_0_in_2[d0*9+d1_d2*1]))*31768), 7)*((int)expr_0_in_3[d0*1])), 7)*27142), 22)+((int)expr_0_in_0[d0*1]))*32079), 16)), (7)));
}
}
gap_waitbarrier(0);
}
I get undefined references to all occasions of mul-add fusion kernels generated with the expression_matcher
[...]/BUILD/GAP8_V2/GCC_RISCV_PULPOS/BUILD_MODEL_SQ8BIT/modelKernels.o: In function `hal_spr_read_then_clr':
/home/rik/gap_sdk_490/install/GAP8_V2/include/hal/dma/mchan_v6.h:272: undefined reference to `s241_kernel'
Add MODEL_EXPRESSIONS = $(MODEL_BUILD)/Expression_Kernels.c to common.mk. It's done automatically by gen_project. We need to make that more automatic.
Thank you.
I've been working with the SDK for quite a while now and have a lot of legacy code in my project. I decided to go for a clean start with gen_project and that seems to have fixed it for me. Nice feature!
I meet this problem seem like this, when I add fusions -a expression_matcher
to nntool script, and I add MODEL_EXPRESSIONS = $(MODEL_BUILD)/Expression_Kernels.c
in model_decl.mk. My SDK version is 4.8.0
here is the error info:
/home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/BUILD_MODEL_SQ8BIT/modelKernels.o: In function `eu_bar_setup_mask': /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' /home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/BUILD_MODEL_SQ8BIT/modelKernels.o: In function `rt_team_fork': /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' collect2: error: ld returned 1 exit status make: *** [/home/taozhi/gap/gap_sdk/utils/rules/pulp_rules.mk:227:/home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/application] error 1
I meet this problem seem like this, when I
add fusions -a expression_matcher
to nntool script, and I addMODEL_EXPRESSIONS = $(MODEL_BUILD)/Expression_Kernels.c
in model_decl.mk. My SDK version is 4.8.0 here is the error info:/home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/BUILD_MODEL_SQ8BIT/modelKernels.o: In function `eu_bar_setup_mask': /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' /home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/BUILD_MODEL_SQ8BIT/modelKernels.o: In function `rt_team_fork': /home/taozhi/gap/gap_sdk/install/GAP8_V2/include/pmsis/implem/dma.h:268: undefined reference to `s2_kernel' collect2: error: ld returned 1 exit status make: *** [/home/taozhi/gap/gap_sdk/utils/rules/pulp_rules.mk:227:/home/taozhi/tf2/BUILD/GAP8_V2/GCC_RISCV_PULPOS/application] error 1
If anyone gets the same issue, I solved a similar error by adding:
MODEL_EXPRESSIONS = $(MODEL_BUILD)/Expression_Kernels.c
in model_decl.mk.
And then making sure to use MODEL_EXPRESSIONS
in these two lines, that were already there:
MODEL_GEN_C = $(addsuffix .c, $(MODEL_GEN)) $(MODEL_EXPRESSIONS)
MODEL_GEN_CLEAN = $(MODEL_GEN_C) $(addsuffix .h, $(MODEL_GEN)) $(MODEL_EXPRESSIONS)
If you have a batchnorm directly after a convolution layer, it gets folded into the convolution by the tflite converter. If its not after a convolution layer, it adds a multiply and add operator, but this causes the following nntool error:
The definition of the network:
And a visualization of the tflite network:
With gap_sdk 3.9.1. If the input to the batchnorm is larger, tflite converts it to just a multiply operation with no add, and nntool doesn't throw an error. I don't know exactly what algorithm tflite is using, but it would be nice if the add would be supported so batchnorm works in all cases.