Closed dmsgnn closed 1 year ago
soda-opt ml_program
dialect not supported
output/01searched-edited.mlir:6:3: error: Dialect `ml_program' not found for custom op 'ml_program.global'
ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
^
output/01searched-edited.mlir:6:3: note: Registered dialects: affine, arith, builtin, cf, func, linalg, llvm, math, memref, pdl, scf, snn, soda, transform, vector ; for more info on dialect registration see https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
soda-opt tensor
dialect not supported
output/01searched-edited.mlir:15:10: error: Dialect `tensor' not found for custom op 'tensor.empty'
%0 = tensor.empty() : tensor<2708x16xf32>
^
output/01searched-edited.mlir:15:10: note: Registered dialects: affine, arith, builtin, cf, func, linalg, llvm, math, memref, pdl, scf, snn, soda, transform, vector ; for more info on dialect registration see https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
terminator of soda launch not fount
output/01searched-edited.mlir:11:5: error: block with no terminator, has
"soda.launch"()
.mlir
file not saved correctly, presence of dense_resource<__elided__>
ElementsAttr does not provide iteration facilities for type `mlir::Attribute`, see attribute: dense_resource<__elided__> : tensor<16x7xf32>
invalid `T` for ElementsAttr::getValues
UNREACHABLE executed at /working_dir/llvm-project/mlir/include/mlir/IR/BuiltinAttributeInterfaces.h:307!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: mlir-translate -opaque-pointers=0 --mlir-to-llvmir output/04baseline.mlir -o output/05baseline.ll
#0 0x00000000006a7647 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm-project/bin/mlir-translate+0x6a7647)
#1 0x00000000006a558e llvm::sys::RunSignalHandlers() (/opt/llvm-project/bin/mlir-translate+0x6a
558e)
#2 0x00000000006a7f7f SignalHandler(int) Signals.cpp:0:0
#3 0x00007fb92dc49420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
#4 0x00007fb92d6dc00b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b)
#5 0x00007fb92d6bb859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22859)
#6 0x000000000065ce51 (/opt/llvm-project/bin/mlir-translate+0x65ce51)
#7 0x000000000079aa3c (/opt/llvm-project/bin/mlir-translate+0x79aa3c)
#8 0x000000000079a1f1 std::enable_if<std::is_same<mlir::Attribute, mlir::Attribute>::value || !std::is_base_of<mlir::Attribute, mlir::Attribute>::value, mlir::detail::ElementsAttrRange<mlir::detail::ElementsAttrIterator<mlir::Attribute>>>::type mlir::ElementsAttr::getValues<mlir::Attribute>() const TensorOps.cpp:0:0
#9 0x0000000000a24680 mlir::LLVM::detail::getLLVMConstant(llvm::Type*, mlir::Attribute, mlir::Location, mlir::LLVM::ModuleTranslation const&) (/opt/llvm-project/bin/mlir-translate+0xa24680)
#10 0x0000000000a27a50 mlir::LLVM::ModuleTranslation::convertGlobals() (/opt/llvm-project/bin/mlir-translate+0xa27a50)
#11 0x0000000000a2b973 mlir::translateModuleToLLVMIR(mlir::Operation*, llvm::LLVMContext&, llvm::StringRef) (/opt/llvm-project/bin/mlir-translate+0xa2b973)
#12 0x0000000000a1b3c6 std::_Function_handler<mlir::LogicalResult (mlir::Operation*, llvm::raw_ostream&), mlir::registerToLLVMIRTranslation()::$_0>::_M_invoke(std::_Any_data const&, mlir::Operation*&&, llvm::raw_ostream&) ConvertToLLVMIR.cpp:0:0
#13 0x0000000000d5b5b8 std::_Function_handler<mlir::LogicalResult (std::shared_ptr<llvm::SourceMgr> const&, llvm::raw_ostream&, mlir::MLIRContext*), mlir::TranslateFromMLIRRegistration::TranslateFromMLIRRegistration(llvm::StringRef, llvm::StringRef, std::function<mlir::LogicalResult (mlir::Operation*, llvm::raw_ostream&)> const&, std::function<void (mlir::DialectRegistry&)> const&)::$_2>::_M_invoke(std::_Any_data const&, std::shared_ptr<llvm::SourceMgr> const&, llvm::raw_ostream&, mlir::MLIRContext*&&) Translation.cpp:0:0
#14 0x0000000000d5a1a9 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::mlirTranslateMain(int, char**, llvm::StringRef)::$_0>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirTranslateMain.cpp:0:0
#15 0x0000000000d61de8 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/opt/llvm-project/bin/mlir-translate+0xd61de8)
#16 0x0000000000d58c73 mlir::mlirTranslateMain(int, char**, llvm::StringRef) (/opt/llvm-project/bin/mlir-translate+0xd58c73)
#17 0x000000000065b815 main (/opt/llvm-project/bin/mlir-translate+0x65b815)
#18 0x00007fb92d6bd083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#19 0x000000000065b71e _start (/opt/llvm-project/bin/mlir-translate+0x65b71e)
The problem 4. happened because the .mlir
file was not exported correctly from torch_mlir
. Fixed in commit 739d17b.
actually, the save of the model has been changed from this
to this
after having run the pygcn
model using the train.py
file, and having correctly saved the pygcn.mlir
file, the following procedures are required:
Create a folder called output, then cd
in its parent folder
Use the following command to remove the tensor.empty() procedures (this solves the problem 2.)
docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
mlir-opt \
--canonicalize \
-convert-tensor-to-linalg \
--empty-tensor-to-alloc-tensor \
--eliminate-empty-tensors \
-linalg-bufferize -arith-bufferize \
-tensor-bufferize -func-bufferize \
-finalizing-bufferize -buffer-deallocation \
--buffer-results-to-out-params \
--canonicalize -cse output/pygcn.mlir \
2>&1 | cat > output/01searched-edited.mlir
Modify the just created file "01searched-edited.mlir" in the following way:
ml_program.global private mutable @global_seed(dense<0> : tensor<i64>) : tensor<i64>
to memref.global "private" @global_seed : memref<i64> = dense<0>
(this solves the problem 1.)soda.launch {
soda.terminator }
(this solved the problem 3.) docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
soda-opt \
-soda-outline-bambu-code \
-soda-extract-arguments-to-xml=using-bare-ptr \
-soda-generate-bambu-accelcode=no-aa \
-lower-all-to-llvm=use-bare-ptr-memref-call-conv \
-mlir-print-ir-after-all \
output/01searched-edited.mlir \
-o output/04baseline.mlir \
2>&1 | cat > output/05intermediate-baseline.mlir
.ll
file docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
mlir-translate -opaque-pointers=0 \
--mlir-to-llvmir \
output/04baseline.mlir \
-o output/05baseline.ll
.ll
file can be fount in the output directory once having created the .ll
file, we can run Bambu. To do so, we can cd
into the soda directory (parent folder of output) and execute the run-bambu.sh
script using sh run-bambu.sh baseline
.
actually, once the script run-bambu.sh
is executed, the following error appears:
/working_dir/input. ll:953:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory (argmem: readwrite) }
1 error generated.
Error in compilation
/working_dir/input.ll:953:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory (argmem: readwrite) }
1 error generated.
error - Front-end compiler returns an error during compilation 2
this error is probably due to the compiler used by the script, which is not up to date
once used the correct compiler, the Bambu error 1 get solved. The knew error is the following
Reading of vector values from input file completed. Simulation started.
Simulation not completed into 200000000 cycles
Start reading vector 1's values from input file.
in this test, the file 01searched-edited.mlir
has been modified in order to try to synthesize only some adds operation of the gnn instead of the whole forward function. The soda launch and terminator have been put before and after the following code block
This experiment has been successfully completed. To be noticed that nor the error Bambu error 1 has appeared, I tried running this test from the docker and it worked, even if with some warnings.
The final recap of the execution log is the following
Simulation completed with success
- /working_dir/HLS_output//simulation/testbench_forward_kernel_tb.v:685: Verilog $finish
File "/working_dir/results.txt" opened
1. Simulation completed with SUCCESS; Execution time 303298 cycles;
Total cycles : 303298 cycles
Number of executions : 1
Average execution : 303298 cycles
After the previous test, I tried to outline the first convolutional layer, which should be represented by the following code block
When trying to run soda-opt
on this code block, the following error appeared
output/01searched-edited.mlir:45:20: error: use of undeclared SSA value name
memref.dealloc %alloc_3 : memref<2708x16xf32>
^
output/01searched-edited.mlir:50:23: error: use of undeclared SSA value name
linalg.matmul ins(%alloc_4, %1 : memref<2708x16xf32>, memref<16x7xf32>) outs(%alloc_6 : memref<2708x7xf32>)
^
I tried to solve the following problem by using, before setting soda.launch
and soda.terminator
, and after having run the mlir opt, so between steps 3 and 4 of the how to do it, the following command which should had move the alloc and dealloc outside the outlining scope
docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
soda-opt \
-forward-memref-allocations \
-forward-linalg-fill \
-forward-memref-copy \
-forward-memref-allocations \
output/01searched-edited.mlir \
2>&1 | ccat > output/01.mlir
This made things a little bit better, but some dealloc were still inside the block, and trying to run it brought to the Bambu error
Total number of flip-flops in function forward_kernel: 6128
.:: Creating Generic Bash Backend Flow ::.
Parameter P0 (494937) (testvector 0) allocated at 1073741824 : reserved_mem_size = 15522256
Parameter P1 (494938) (testvector 0) allocated at 1089264096 : reserved_mem_size = 91712
Parameter P2 (494939) (testvector 0) allocated at 1089355808 : reserved_mem_size = 173312
Parameter P3 (494940) (testvector 0) allocated at 1089529120 : reserved_mem_size = 173312
Parameter P4 (494941) (testvector 0) allocated at 1089702432 : reserved_mem_size = 29333056
Parameter P5 (494942) (testvector 0) allocated at 1119035488 : reserved_mem_size = 173312
Parameter P6 (494943) (testvector 0) allocated at 1119208800 : reserved_mem_size = 64
Parameter P7 (494944) (testvector 0) allocated at 1119208864 : reserved_mem_size = 173312
Parameter P8 (494945) (testvector 0) allocated at 1119382176 : reserved_mem_size = 173312
C-based testbench generation for function forward_kernel: /working_dir/HLS_output//simulation/values.c
Prepared testbench
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
free(): invalid pointer
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Please report bugs to <panda-info@polimi.it>
error -> Error in generating the expected test results
The temporary solution have been to delete the dealloc lines to try to run Bambu, it actually worked but Bambu has not been able to synthesize it due to the big amount of cycle requested, showing the already encountered error
Reading of vector values from input file completed. Simulation started.
Simulation not completed into 200000000 cycles
File "/working_dir/results.txt" opened
error -> Expected a number of cycles different from zero. Something wrong happened during the simulation!
A rough calculus about how many number of cycles should be needed to synthesize the first convolutional layer has been made.
The two matrix multiplication done in the forward of the layer
are between matrices of the following sizes
Assumption: 5 cycles needed for the add, 2 cycles needed for the mul
mm1 mul -> $(2708x1433x16)x2 = 124.178.048$ cycles add -> $(2708x1432x16)x5 = 310.228.480$ cycles ——————————————————————————— for a total of $434.406.528$ cycles for mm1
mm2 mul -> $(2708x2708x16)2 = 234.664.448$ cycles add -> $(2708x2707x16)5 = 586.444.480$ cycles ——————————————————————————— for a total of $821.108.928$ cycles for mm2
mm1 + mm2 a total of $1.255.515.456$ cycles (1.2 mld of cycles)
In order to remove the mlir deallocation, it is necessary to run, before setting soda.launch and soda.terminator, and after having run the mlir opt, so between steps 3 and 4 of the how to do it, the following command
docker run -u $(id -u) -v $(pwd):/working_dir --rm agostini01/soda \
soda-opt \
--erase-buffer-deallocation \
output/01searched-edited.mlir \
2>&1 | ccat > output/01.mlir
description
The aim of this issue is to create the
.ll
file, starting from thepygcn.mlir
file created using the torch-mlir scripttrain.py
.