Create `.ll` file, starting from `pygcn.mlir` to run Bambu

dmsgnn commented 1 year ago

description

The aim of this issue is to create the .ll file, starting from the pygcn.mlir file created using the torch-mlir script train.py.

dmsgnn commented 1 year ago

problems encountered

soda-opt ml_program dialect not supported

output/01searched-edited.mlir:6:3: error: Dialect `ml_program' not found for custom op 'ml_program.global' 
  ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
  ^
output/01searched-edited.mlir:6:3: note: Registered dialects: affine, arith, builtin, cf, func, linalg, llvm, math, memref, pdl, scf, snn, soda, transform, vector ; for more info on dialect registration see https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management

soda-opt tensor dialect not supported

output/01searched-edited.mlir:15:10: error: Dialect `tensor' not found for custom op 'tensor.empty' 
%0 = tensor.empty() : tensor<2708x16xf32> 
         ^ 
output/01searched-edited.mlir:15:10: note: Registered dialects: affine, arith, builtin, cf, func, linalg, llvm, math, memref, pdl, scf, snn, soda, transform, vector ; for more info on dialect registration see https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management

terminator of soda launch not fount

output/01searched-edited.mlir:11:5: error: block with no terminator, has
"soda.launch"()

.mlir file not saved correctly, presence of dense_resource<__elided__>

ElementsAttr does not provide iteration facilities for type `mlir::Attribute`, see attribute: dense_resource<__elided__> : tensor<16x7xf32>
invalid `T` for ElementsAttr::getValues
UNREACHABLE executed at /working_dir/llvm-project/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h:307!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.    Program arguments: mlir-translate -opaque-pointers=0 --mlir-to-llvmir output/04baseline.mlir -o output/05baseline.ll
#0 0x00000000006a7647 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm-project/bin/mlir-translate+0x6a7647)
#1 0x00000000006a558e llvm::sys::RunSignalHandlers() (/opt/llvm-project/bin/mlir-translate+0x6a
558e)
#2 0x00000000006a7f7f SignalHandler(int) Signals.cpp:0:0
#3 0x00007fb92dc49420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#4 0x00007fb92d6dc00b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b)
#5 0x00007fb92d6bb859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22859)
#6 0x000000000065ce51 (/opt/llvm-project/bin/mlir-translate+0x65ce51)
#7 0x000000000079aa3c (/opt/llvm-project/bin/mlir-translate+0x79aa3c)
#8 0x000000000079a1f1 std::enable_if<std::is_same<mlir::Attribute, mlir::Attribute>::value || !std::is_base_of<mlir::Attribute, mlir::Attribute>::value, mlir::detail::ElementsAttrRange<mlir::detail::ElementsAttrIterator<mlir::Attribute>>>::type mlir::ElementsAttr::getValues<mlir::Attribute>() const TensorOps.cpp:0:0
#9 0x0000000000a24680 mlir::LLVM::detail::getLLVMConstant(llvm::Type*, mlir::Attribute, mlir::Location, mlir::LLVM::ModuleTranslation const&) (/opt/llvm-project/bin/mlir-translate+0xa24680)
#10 0x0000000000a27a50 mlir::LLVM::ModuleTranslation::convertGlobals() (/opt/llvm-project/bin/mlir-translate+0xa27a50)
#11 0x0000000000a2b973 mlir::translateModuleToLLVMIR(mlir::Operation*, llvm::LLVMContext&, llvm::StringRef) (/opt/llvm-project/bin/mlir-translate+0xa2b973)
#12 0x0000000000a1b3c6 std::_Function_handler<mlir::LogicalResult (mlir::Operation*, llvm::raw_ostream&), mlir::registerToLLVMIRTranslation()::$_0>::_M_invoke(std::_Any_data const&, mlir::Operation*&&, llvm::raw_ostream&) ConvertToLLVMIR.cpp:0:0
#13 0x0000000000d5b5b8 std::_Function_handler<mlir::LogicalResult (std::shared_ptr<llvm::SourceMgr> const&, llvm::raw_ostream&, mlir::MLIRContext*), mlir::TranslateFromMLIRRegistration::TranslateFromMLIRRegistration(llvm::StringRef, llvm::StringRef, std::function<mlir::LogicalResult (mlir::Operation*, llvm::raw_ostream&)> const&, std::function<void (mlir::DialectRegistry&)> const&)::$_2>::_M_invoke(std::_Any_data const&, std::shared_ptr<llvm::SourceMgr> const&, llvm::raw_ostream&, mlir::MLIRContext*&&) Translation.cpp:0:0
#14 0x0000000000d5a1a9 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::mlirTranslateMain(int, char**, llvm::StringRef)::$_0>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirTranslateMain.cpp:0:0
#15 0x0000000000d61de8 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/opt/llvm-project/bin/mlir-translate+0xd61de8)
#16 0x0000000000d58c73 mlir::mlirTranslateMain(int, char**, llvm::StringRef) (/opt/llvm-project/bin/mlir-translate+0xd58c73)
#17 0x000000000065b815 main (/opt/llvm-project/bin/mlir-translate+0x65b815)
#18 0x00007fb92d6bd083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#19 0x000000000065b71e _start (/opt/llvm-project/bin/mlir-translate+0x65b71e)

dmsgnn commented 1 year ago

The problem 4. happened because the .mlir file was not exported correctly from torch_mlir. Fixed in commit 739d17b.

actually, the save of the model has been changed from this

https://github.com/dmsgnn/master-thesis/blob/5a5e2a71c0892b5e4a7418f0e3289766b51ebfcf/pygcn/train.py#L112-L116

to this

https://github.com/dmsgnn/master-thesis/blob/739d17b1f588df1c54e05e9fe96d8f094cf7d008/pygcn/train.py#L112-L115

dmsgnn commented 1 year ago

how to do it

after having run the pygcn model using the train.py file, and having correctly saved the pygcn.mlir file, the following procedures are required:

Create a folder called output, then cd in its parent folder

Use the following command to remove the tensor.empty() procedures (this solves the problem 2.)

docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
                                        mlir-opt \
                                        --canonicalize \
                                        -convert-tensor-to-linalg \
                                        --empty-tensor-to-alloc-tensor \
                                        --eliminate-empty-tensors \
                                        -linalg-bufferize -arith-bufferize \
                                        -tensor-bufferize -func-bufferize \
                                      -finalizing-bufferize -buffer-deallocation \
                                      --buffer-results-to-out-params \
                                      --canonicalize -cse output/pygcn.mlir \
                                  2>&1 | cat > output/01searched-edited.mlir

Modify the just created file "01searched-edited.mlir" in the following way:

line 9 must be changed from ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64> to memref.global "private" @global_seed : memref<i64> = dense<0> (this solves the problem 1.)
under the init of the function - line 10 - add soda.launch {
add before the "return" - line 123 - the soda.terminator } (this solved the problem 3.)

Then, this modified version is now ready to be used with soda-opt. Run the following command

    docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
                         soda-opt \
                           -soda-outline-bambu-code \
                           -soda-extract-arguments-to-xml=using-bare-ptr \
                           -soda-generate-bambu-accelcode=no-aa \
                           -lower-all-to-llvm=use-bare-ptr-memref-call-conv \
                           -mlir-print-ir-after-all \
                           output/01searched-edited.mlir \
                           -o output/04baseline.mlir \
                           2>&1 | cat > output/05intermediate-baseline.mlir

Run the following command to obtain the .ll file

    docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
                              mlir-translate -opaque-pointers=0 \
                                 --mlir-to-llvmir \
                                 output/04baseline.mlir \
                                 -o output/05baseline.ll

The .ll file can be fount in the output directory

dmsgnn commented 1 year ago

how to run Bambu

once having created the .ll file, we can run Bambu. To do so, we can cd into the soda directory (parent folder of output) and execute the run-bambu.sh script using sh run-bambu.sh baseline.

dmsgnn commented 1 year ago

Bambu error 1

actually, once the script run-bambu.sh is executed, the following error appears:

/working_dir/input. ll:953:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory (argmem: readwrite) }

1 error generated.
Error in compilation
/working_dir/input.ll:953:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory (argmem: readwrite) }

1 error generated.
error - Front-end compiler returns an error during compilation 2

this error is probably due to the compiler used by the script, which is not up to date

dmsgnn commented 1 year ago

Bambu error 2

once used the correct compiler, the Bambu error 1 get solved. The knew error is the following

Reading of vector values from input file completed. Simulation started.
Simulation not completed into 200000000 cycles
Start reading vector 1's values from input file.

dmsgnn commented 1 year ago

outlining of add part

in this test, the file 01searched-edited.mlir has been modified in order to try to synthesize only some adds operation of the gnn instead of the whole forward function. The soda launch and terminator have been put before and after the following code block

https://github.com/dmsgnn/master-thesis/blob/052c9a0092695612874724f1ffe061807924f0f7/pygcn/soda/output/01searched-edited.mlir#L30-L34

This experiment has been successfully completed. To be noticed that nor the error Bambu error 1 has appeared, I tried running this test from the docker and it worked, even if with some warnings.

The final recap of the execution log is the following

Simulation completed with success

- /working_dir/HLS_output//simulation/testbench_forward_kernel_tb.v:685: Verilog $finish
File "/working_dir/results.txt" opened
1. Simulation completed with SUCCESS; Execution time 303298 cycles;
  Total cycles             : 303298 cycles
  Number of executions     : 1
  Average execution        : 303298 cycles

bambu-log.txt

dmsgnn commented 1 year ago

outlining of convolutional layer 1

After the previous test, I tried to outline the first convolutional layer, which should be represented by the following code block

https://github.com/dmsgnn/master-thesis/blob/31ab3f7a3719ca1841a9f232a6c46ae2147958e9/pygcn/soda/output/01searched-edited.mlir#L23-L42

When trying to run soda-opt on this code block, the following error appeared

output/01searched-edited.mlir:45:20: error: use of undeclared SSA value name
    memref.dealloc %alloc_3 : memref<2708x16xf32>
                   ^
output/01searched-edited.mlir:50:23: error: use of undeclared SSA value name
    linalg.matmul ins(%alloc_4, %1 : memref<2708x16xf32>, memref<16x7xf32>) outs(%alloc_6 : memref<2708x7xf32>)
                      ^

I tried to solve the following problem by using, before setting soda.launch and soda.terminator, and after having run the mlir opt, so between steps 3 and 4 of the how to do it, the following command which should had move the alloc and dealloc outside the outlining scope

docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
                                     soda-opt \
                                       -forward-memref-allocations \
                                       -forward-linalg-fill \
                                       -forward-memref-copy \
                                       -forward-memref-allocations \
                                       output/01searched-edited.mlir \
                                       2>&1 | ccat > output/01.mlir

This made things a little bit better, but some dealloc were still inside the block, and trying to run it brought to the Bambu error

Total number of flip-flops in function forward_kernel: 6128
.:: Creating Generic Bash Backend Flow ::.
  Parameter P0 (494937) (testvector 0) allocated at 1073741824 : reserved_mem_size = 15522256
  Parameter P1 (494938) (testvector 0) allocated at 1089264096 : reserved_mem_size = 91712
  Parameter P2 (494939) (testvector 0) allocated at 1089355808 : reserved_mem_size = 173312
  Parameter P3 (494940) (testvector 0) allocated at 1089529120 : reserved_mem_size = 173312
  Parameter P4 (494941) (testvector 0) allocated at 1089702432 : reserved_mem_size = 29333056
  Parameter P5 (494942) (testvector 0) allocated at 1119035488 : reserved_mem_size = 173312
  Parameter P6 (494943) (testvector 0) allocated at 1119208800 : reserved_mem_size = 64
  Parameter P7 (494944) (testvector 0) allocated at 1119208864 : reserved_mem_size = 173312
  Parameter P8 (494945) (testvector 0) allocated at 1119382176 : reserved_mem_size = 173312
  C-based testbench generation for function forward_kernel: /working_dir/HLS_output//simulation/values.c
  Prepared testbench
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
free(): invalid pointer
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Please report bugs to <panda-info@polimi.it>

error -> Error in generating the expected test results

The temporary solution have been to delete the dealloc lines to try to run Bambu, it actually worked but Bambu has not been able to synthesize it due to the big amount of cycle requested, showing the already encountered error

Reading of vector values from input file completed. Simulation started.
Simulation not completed into 200000000 cycles
File "/working_dir/results.txt" opened
error -> Expected a number of cycles different from zero. Something wrong happened during the simulation!

dmsgnn commented 1 year ago

number of cycles requested by convolutional layer 1

A rough calculus about how many number of cycles should be needed to synthesize the first convolutional layer has been made.

The two matrix multiplication done in the forward of the layer

https://github.com/dmsgnn/master-thesis/blob/31ab3f7a3719ca1841a9f232a6c46ae2147958e9/pygcn/layers.py#L32-L33

are between matrices of the following sizes

$(2708x1433) x (1433x16) $
$(2708x2708) x (2708x16)$

Assumption: 5 cycles needed for the add, 2 cycles needed for the mul

mm1 mul -> $(2708x1433x16)x2 = 124.178.048$ cycles add -> $(2708x1432x16)x5 = 310.228.480$ cycles ——————————————————————————— for a total of $434.406.528$ cycles for mm1

mm2 mul -> $(2708x2708x16)2 = 234.664.448$ cycles add -> $(2708x2707x16)5 = 586.444.480$ cycles ——————————————————————————— for a total of $821.108.928$ cycles for mm2

mm1 + mm2 a total of $1.255.515.456$ cycles (1.2 mld of cycles)

dmsgnn commented 1 year ago

remove mlir deallocation with soda-opt

In order to remove the mlir deallocation, it is necessary to run, before setting soda.launch and soda.terminator, and after having run the mlir opt, so between steps 3 and 4 of the how to do it, the following command

docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
                                     soda-opt \
                                       --erase-buffer-deallocation \
                                       output/01searched-edited.mlir \
                                       2>&1 | ccat > output/01.mlir

dmsgnn / master-thesis