Tiramisu-Compiler / tiramisu

A polyhedral compiler for expressing fast and portable data parallel algorithms
http://tiramisu-compiler.org
MIT License
916 stars 132 forks source link

[Bug] CPU convolution sample in benchmark runfailed when setting BATCH_SIZE=1 #344

Open SubjectNoi opened 3 years ago

SubjectNoi commented 3 years ago

Env: Ubuntu-18.04 Tiramisu commit: 6512d11a79393ceebf666c34c8d4eb4bf817e7f4

I'm trying the code example to generate CPU Convolution kernel in benchmark/DNN/layers/convolution/direct/cpu/conv_layer_generator_tiramisu.cpp, however, when I set the defined macro BATCH_SIZE to 1, it will trigger a runtime error as following:

./isl_list_templ.c:143: index out of bounds
main: /home/zhliu/workspace/tiramisu/src/tiramisu_core.cpp:5535: int tiramisu::compute_recursively_max_AST_depth(isl_ast_node*): Assertion `node != NULL' failed.
Aborted (core dumped)

And I located this error is triggered at Line: 110, (But I fail to figure out what's the relation between batch size and this output channel blocking that why a change in batch size will cause output channel blocking to fail, maybe it's because the total loop depth decrease by 1 when setting the batch size to 1?):

109:    // Vectorize and unroll
110:   reg_load.vectorize(ffout, FOUT_BLOCKING);    // This line!
111:    conv.vectorize(ffout, FOUT_BLOCKING);
112:    reg_store.vectorize(ffout, FOUT_BLOCKING);
rbaghdadi commented 3 years ago

When you set the size of a loop to 1 that loop will disappear, this will mean that you need to change the other optimizations to take this in consideration. You should in this case use the low level Tiramisu API that uses loop number instead of the API that uses the loop names (the loop names are not more valid in this special case). When you use the loop numbers, you need to take in consideration the fact that the batch loop has disappeared. So now for example instead of vectorizing loop 4 (this is just an example of a loop number), you should vectorize the loop 3, because the old loop 4 is now loop 3.