UIUC-ChenLab / scalehls

A scalable High-Level Synthesis framework on MLIR
Other
226 stars 46 forks source link

CPP emitted causes Vitis to segfault #47

Open makslevental opened 2 years ago

makslevental commented 2 years ago

Description

Tried to ScaleHLS+Vitis a simple model using the instructions in the README, namely

python export_braggnn_mlir.py > braggnn.mlir

torch-mlir-opt braggnn.mlir \
    -torchscript-module-to-torch-backend-pipeline="optimize=true" \
    -torch-backend-to-tosa-backend-pipeline="optimize=true" > braggnn.tosa.mlir

scalehls-opt braggnn.tosa.mlir \
    -scalehls-pytorch-pipeline-v1="top-func=forward loop-tile-size=4 loop-unroll-factor=2" \
    | scalehls-translate -emit-hlscpp > braggnn.cpp

and got this on Vitis v2021.2:

INFO: [XFORM 203-712] Applying dataflow to function 'forward' (braggnn.cpp:2233), detected/extracted 21 process function(s): 
     'entry_proc484'
     'forward_node0'
     'forward_node1'
     'forward_node2'
     'forward_node3'
     'forward_node4'
     'forward_node5'
     'forward_node6'
     'forward_node7'
     'forward_node8'
     'forward_node9'
     'forward_node10'
     'forward_node11'
     'forward_node12'
     'forward_node13'
     'forward_node14'
     'forward_node15'
     'forward_node16'
     'forward_node17'
     'forward_node18'
     'forward_node19'.
Stack dump:
0.  Running pass 'Dead Global Elimination' on module '/home/mlevental/dev_projects/scalehls/samples/pytorch/braggnn/proj/solution1/.autopilot/db/a.o.1.bc'.
Abnormal program termination (11)

with stack dump:

Stack:
/lib/x86_64-linux-gnu/libc.so.6(+0x430c0) [0x7f8f8187e0c0]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::TargetData::getTypeSizeInBits(llvm::Type*) const+0) [0x7f8f6b394210]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::AliasAnalysis::getTypeStoreSize(llvm::Type*)+0x12) [0x7f8f6b708312]
...

Dropping #pragma HLS dataflow both in forward and in forward_node19 doesn't fix but produces a different segfault:

WARNING: [HLS 200-1888] The stable scalar argument 'v1012' is written in a dataflow region ((braggnn.cpp:144:1)). This is not supported and may lead to incorrect RTL code.
Stack dump:
0.      Running pass 'Check subprocesses communication behavior in dataflow region' on module '/home/mlevental/dev_projects/scalehls/samples/pytorch/braggnn/proj/solution1/.autopilot/db/a.o.1.bc'.
Abnormal program termination (11)

with different stack dump:

Stack:
/lib/x86_64-linux-gnu/libc.so.6(+0x430c0) [0x7f31dba080c0]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(llvm::PtrUserTree::visitTree(std::set<llvm::Argument*, std::less<llvm::Argument*>, std::allocator<llvm::Argument*> >&, llvm::SetVector<llvm::Instruction*, std::vector<llvm::Instruction*, std::allocator<llvm::Instruction*> >, llvm::SmallSet<llvm::Instruction*, 16u, std::less<llvm::Instruction*> > >&, bool) const+0xfb) [0x7f31c356f4ab]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::DataflowDepGraph::analyzeAccessBehavior(llvm::CallSite, llvm::Value*)+0x12b) [0x7f31c358876b]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::DataflowDepGraph::initializeSubprocesses(std::map<llvm::Function*, std::vector<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> >, std::allocator<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> > > >, std::less<llvm::Function*>, std::allocator<std::pair<llvm::Function* const, std::vector<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> >, std::allocator<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> > > > > > > const&)+0x108) [0x7f31c3588948]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::CheckDFChannels::runOnModule(llvm::Module&)+0xa16) [0x7f31c3563eb6]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::MPPassManager::runOnModule(llvm::Module&)+0x182) [0x7f31c9393aa2]

Let me know if there's anything I can do to help debug.

Artifacts

  1. PyTorch model
  2. export script
  3. torch-mlir IR
  4. tosa IR
  5. ScaleHLS emitted CPP
  6. tcl script
  7. autopilot log
  8. first stack dump
  9. second stack dump
makslevental commented 2 years ago

FWIW rewinding to https://github.com/hanchenye/scalehls/commit/d6ffcd0c5b2fa67c624a500365a3cc95942d91c0 and running with -scalehls-pytorch-pipeline="top-func=forward dataflow-gran=0 opt-level=2" (i.e., completely disabling dataflow) does fix the issue and the synthesis goes all the way through.

hanchenye commented 2 years ago

This is an error that I've never seen. But it seems Vivado has recognized all the dataflow stages, which is a good sign :) Thanks for providing the artifacts, will have a try on them.

stephenneuendorffer commented 2 years ago

It would be good to capture this for the vitis hls folks: if you have the source code and TCL that fails?

makslevental commented 2 years ago

It would be good to capture this for the vitis hls folks: if you have the source code and TCL that fails?

@stephenneuendorffer I think all of the artifacts should be enough for repro but I can provide whatever else is needed.

SerenaC94 commented 2 years ago

Does this mean scaleHLS is guaranteed to work with Vivado, but not with Vitis?

chhzh123 commented 2 years ago

Does this mean scaleHLS is guaranteed to work with Vivado, but not with Vitis?

Same question. I found the test_gemm_dse.cpp in the README also could not pass the vitis_hls compilation, since it tried to partition the input array with interface specification. My vitis_hls version is v2019.2.1.

hanchenye commented 2 years ago

I found the test_gemm_dse.cpp in the README also could not pass the vitis_hls compilation, since it tried to partition the input array with interface specification. My vitis_hls version is v2019.2.1.

I have also observed this issue. From a specific version, Vitis has renamed the "resource" directive to "bind_op" and "bind_storage" directives. Meanwhile, "bind_storage" is no longer allowed to be applied on interface arrays. Instead, the "storage_impl" and "storage_type" options are merged into the "interface" directive.

A temporary solution to adapt Vitis HLS is updating the emission logic here: https://github.com/hanchenye/scalehls/blob/4acb8795839dd2ba291733a521f2646db756edd2/lib/Translation/EmitHLSCpp.cpp#L1751-L1754

Ultimately, I'd think to have a target triple for the C++ emitter to specify the vendor tool and version of emission.

makslevental commented 2 years ago

Ultimately, I'd think to have a target triple for the C++ emitter to specify the vendor tool and version of emission.

probably ultimately @stephenneuendorffer and Xilinx should just buy ScaleHLS 😉