CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
184 stars 29 forks source link

[llvm-spirv] Failed compilation when running benchmarks #510

Closed jjennychen closed 1 year ago

jjennychen commented 1 year ago

When running a benchmark from the benchmark set (https://github.com/zjin-lcf/HeCBench.git), I got the following error that caused the compilation to fail: (seems like some issues with the llvm-spirv...)

llvm-spirv: /gpfs/jlse-fs0/users/bertoni/chip-spv_source-20230531-release/SPIRV-LLVM-Translator/lib/SPIRV/libSPIRV/SPIRVModule.cpp:172: virtual void SPIRV::SPIRVModuleImpl::setSPIRVVersion(SPIRV::SPIRVWord): Assertion `this->isAllowedToUseVersion(static_cast<VersionNumber>(Ver))' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: /soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv --spirv-max-version=1.1 --spirv-ext=+all /tmp/main-generic-lower-43a1cf.bc -o /tmp/main-118076/main-generic.out
 #0 0x0000000000b85498 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0xb85498)
 #1 0x0000000000b8326c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fab6034f8c0 __restore_rt (/lib64/libpthread.so.0+0x168c0)
 #3 0x00007fab5ead6cbb raise (/lib64/libc.so.6+0x4acbb)
 #4 0x00007fab5ead8355 abort (/lib64/libc.so.6+0x4c355)
 #5 0x00007fab5eacecba __assert_fail_base (/lib64/libc.so.6+0x42cba)
 #6 0x00007fab5eaced42 (/lib64/libc.so.6+0x42d42)
 #7 0x00000000006bec69 SPIRV::SPIRVModuleImpl::setSPIRVVersion(unsigned int) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x6bec69)
 #8 0x00000000005bd09c SPIRV::SPIRVModule::setMinSPIRVVersion(SPIRV::VersionNumber) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5bd09c)
 #9 0x0000000000669b0f SPIRV::SPIRVEntry::updateModuleVersion() const (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x669b0f)
#10 0x000000000066b236 SPIRV::SPIRVCapability::SPIRVCapability(SPIRV::SPIRVModule*, spv::Capability) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x66b236)
#11 0x00000000006bf5f7 SPIRV::SPIRVModuleImpl::addCapability(spv::Capability) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x6bf5f7)
#12 0x00000000006bfe4c SPIRV::SPIRVModuleImpl::addEntry(SPIRV::SPIRVEntry*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x6bfe4c)
#13 0x00000000006ac1a6 SPIRV::SPIRVInstruction* SPIRV::SPIRVModule::add<SPIRV::SPIRVInstruction>(SPIRV::SPIRVInstruction*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x6ac1a6)
#14 0x0000000000798014 SPIRV::SPIRVBasicBlock::addInstruction(SPIRV::SPIRVInstruction*, SPIRV::SPIRVInstruction const*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x798014)
#15 0x00000000006c72f2 SPIRV::SPIRVModuleImpl::addInstTemplate(SPIRV::SPIRVInstTemplateBase*, std::vector<unsigned int, std::allocator<unsigned int>> const&, SPIRV::SPIRVBasicBlock*, SPIRV::SPIRVType*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x6c72f2)
#16 0x00000000005e92bb SPIRV::LLVMToSPIRVBase::transBuiltinToInstWithoutDecoration(spv::Op, llvm::CallInst*, SPIRV::SPIRVBasicBlock*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e92bb)
#17 0x00000000005e57a1 SPIRV::LLVMToSPIRVBase::transBuiltinToInst(llvm::StringRef, llvm::CallInst*, SPIRV::SPIRVBasicBlock*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e57a1)
#18 0x00000000005e1b64 SPIRV::LLVMToSPIRVBase::transDirectCallInst(llvm::CallInst*, SPIRV::SPIRVBasicBlock*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e1b64)
#19 0x00000000005e1970 SPIRV::LLVMToSPIRVBase::transCallInst(llvm::CallInst*, SPIRV::SPIRVBasicBlock*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e1970)
#20 0x00000000005d5349 SPIRV::LLVMToSPIRVBase::transValueWithoutDecoration(llvm::Value*, SPIRV::SPIRVBasicBlock*, bool, SPIRV::LLVMToSPIRVBase::FuncTransMode) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5d5349)
#21 0x00000000005ce370 SPIRV::LLVMToSPIRVBase::transValue(llvm::Value*, SPIRV::SPIRVBasicBlock*, bool, SPIRV::LLVMToSPIRVBase::FuncTransMode) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5ce370)
#22 0x00000000005e4770 SPIRV::LLVMToSPIRVBase::transFunction(llvm::Function*) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e4770)
#23 0x00000000005e4fb5 SPIRV::LLVMToSPIRVBase::translate() (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5e4fb5)
#24 0x00000000005c69f4 SPIRV::LLVMToSPIRVBase::runLLVMToSPIRV(llvm::Module&) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5c69f4)
#25 0x00000000005c5402 SPIRV::LLVMToSPIRVPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5c5402)
#26 0x000000000064bae0 llvm::detail::PassModel<llvm::Module, SPIRV::LLVMToSPIRVPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x64bae0)
#27 0x0000000000a994d1 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0xa994d1)
#28 0x00000000005ea470 (anonymous namespace)::runSpirvWriterPasses(llvm::Module*, std::ostream*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, SPIRV::TranslatorOpts const&) SPIRVWriter.cpp:0:0
#29 0x00000000005ea593 llvm::writeSpirv(llvm::Module*, SPIRV::TranslatorOpts const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&) (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x5ea593)
#30 0x0000000000481354 convertLLVMToSPIRV(SPIRV::TranslatorOpts const&) llvm-spirv.cpp:0:0
#31 0x00000000004850f1 main (/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/llvm-spirv+0x4850f1)
#32 0x00007fab5eac129d __libc_start_main (/lib64/libc.so.6+0x3529d)
#33 0x000000000047e1ca _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:122:0
clang-16: error: unable to execute command: Aborted
clang-16: error: hipspv-link command failed due to signal (use -v to see invocation)
clang version 16.0.0 (https://github.com/llvm/llvm-project 08d094a0e457360ad8b94b017d2dc277e697ca76)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /soft/compilers/clang-chip-spv/16.0-20230531-release/bin
clang-16: note: diagnostic msg: Error generating preprocessed source(s).

failed to execute:/soft/compilers/clang-chip-spv/16.0-20230531-release/bin/clang++ -x hip main.cu -D__HIP_PLATFORM_SPIRV__= --offload=spirv64 -nohipwrapperinc --hip-path=/soft/compilers/chip-spv/20230531-release --target=x86_64-unknown-linux-gnu -include /soft/compilers/chip-spv/20230531-release/include/hip/spirv_fixups.h -I//soft/compilers/chip-spv/20230531-release/include  -std=c++14 -Wall -g -DDEBUG -O3 -c  -o main.o

[To Reproduce the Issue]

  1. Log into JLSE
  2. Clone the benchmark: git clone https://github.com/zjin-lcf/HeCBench.git
  3. Get an iris node (Intel Gen9 ): qsub -I -n 1 -q iris -t 360
  4. Load modules:
    module use /soft/modulefiles
    module purge
    module load intel_compute_runtime
    module load chip-spv
  5. cd into the benchmark (can replace "bh-hip" for other benchmarks) cd HeCBench/bh-hip
  6. make make VERIFY=yes DEBUG=yes

After that make command, the error about llvm-spirv should show up.

(Extra note: to run the benchmark after compiling, do make run)

jjennychen commented 1 year ago

There are also some other failed compilations for other benchmarks. Would adding them as a list under this issue or creating a separate issue for each error be more convenient/desirable? Thank you!!

linehill commented 1 year ago

The benchmark has a warp vote function:

...
          tmp = dx*dx + (dy*dy + (dz*dz + epssqd));  // compute distance squared (plus softening)
          if ((n < nbodiesd) || __all(tmp >= dq[depth])) {  
...

For warp vote functions you''ll need patched clang and llvm-spirv to compile the application.

Would adding them as a list under this issue or creating a separate issue for each error be more convenient/desirable?

If the cases fail for other reasons than the warp functions, it might be better to put them into separate issues. If you are unsure, you may list them here.

zjin-lcf commented 1 year ago

I'd like to add a note. Running the "bh-hip" on an AMD GPU (e.g. wave-front size is 32) hangs. Not sure if this is reproducible. I don't think the CUDA program "bh-cuda" hangs on modern NVIDIA GPUs.

The issue was also reported here: https://github.com/oneapi-src/SYCLomatic/issues/791

jjennychen commented 1 year ago

Thank you for the note! Yeah the "bh-hip" hung when I ran it on Intel Gen9 too (with ChipStar/CHIP-SPV)... But it did compile successfully with the patched clang and llvm-spirv! Thank you again!!

zjin-lcf commented 1 year ago

I hope patched clang and llvm-spirv are installed by default in the chipStar install guide.

pjaaskel commented 1 year ago

Yep, there's a mention in the beginning of the README.md. I suppose this can be now closed?