chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 421 forks source link

Internal error when building Chapel code with HIP module loaded #23791

Closed Guillaume-Helbecque closed 3 weeks ago

Guillaume-Helbecque commented 1 year ago

Cannot compile a Chapel program (even a sequential "hello world") in a shell environment where the HIP module is loaded (hip/5.2.0_gcc-10.4.0). It gives:

internal error: UTI-MIS-1041 chpl version 1.32.0

Internal errors indicate a bug in the Chapel compiler,
and we're sorry for the hassle.  We would appreciate your reporting this bug --
please see https://chapel-lang.org/bugs.html for instructions.

I checked that module unload hip/.... make things work.

Compile command: chpl foo.chpl

Configuration Information

e-kayrakli commented 1 year ago

Thanks for the bug report @Guillaume-Helbecque!

This could be about not finding correct paths at build time but failing to report that. Could you also give us the output from $CHPL_HOME/util/printchplenv --anonymize --internal --all?

Guillaume-Helbecque commented 1 year ago

Here it is:

CHPL_HOST_PLATFORM: linux64 *
CHPL_HOST_COMPILER: gnu *
  CHPL_HOST_CC: gcc
  CHPL_HOST_CXX: g++
  CHPL_HOST_BUNDLED_COMPILE_ARGS: -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/include -std=c++14 -fno-exceptions -fno-rtti -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Wno-comment -DHAVE_LLVM -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/jemalloc/install/host/linux64-x86_64-gnu/include
  CHPL_HOST_SYSTEM_COMPILE_ARGS: 
  CHPL_HOST_BUNDLED_LINK_ARGS: -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/lib -Wl,-rpath,/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/lib -lclangFrontend -lclangSerialization -lclangDriver -lclangCodeGen -lclangParse -lclangSema -lclangAnalysis -lclangEdit -lclangASTMatchers -lclangAST -lclangLex -lclangBasic -lclangSupport -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/lib -lLLVMWindowsDriver -lLLVMLTO -lLLVMExtensions -lLLVMCoverage -lLLVMAMDGPUTargetMCA -lLLVMAMDGPUDisassembler -lLLVMAMDGPUAsmParser -lLLVMAMDGPUCodeGen -lLLVMMIRParser -lLLVMAMDGPUDesc -lLLVMAMDGPUUtils -lLLVMAMDGPUInfo -lLLVMPasses -lLLVMCoroutines -lLLVMNVPTXCodeGen -lLLVMNVPTXDesc -lLLVMNVPTXInfo -lLLVMAArch64Disassembler -lLLVMAArch64AsmParser -lLLVMAArch64CodeGen -lLLVMAArch64Desc -lLLVMAArch64Utils -lLLVMAArch64Info -lLLVMX86TargetMCA -lLLVMMCA -lLLVMX86Disassembler -lLLVMX86AsmParser -lLLVMX86CodeGen -lLLVMCFGuard -lLLVMGlobalISel -lLLVMX86Desc -lLLVMX86Info -lLLVMMCDisassembler -lLLVMSelectionDAG -lLLVMAsmPrinter -lLLVMCodeGen -lLLVMTarget -lLLVMObjCARCOpts -lLLVMOption -lLLVMipo -lLLVMInstrumentation -lLLVMVectorize -lLLVMLinker -lLLVMIRReader -lLLVMAsmParser -lLLVMFrontendOpenMP -lLLVMScalarOpts -lLLVMInstCombine -lLLVMAggressiveInstCombine -lLLVMTransformUtils -lLLVMBitWriter -lLLVMAnalysis -lLLVMProfileData -lLLVMSymbolize -lLLVMDebugInfoPDB -lLLVMDebugInfoMSF -lLLVMDebugInfoDWARF -lLLVMObject -lLLVMTextAPI -lLLVMMCParser -lLLVMMC -lLLVMDebugInfoCodeView -lLLVMBitReader -lLLVMCore -lLLVMRemarks -lLLVMBinaryFormat -lLLVMBitstreamReader -lLLVMSupport -lLLVMDemangle -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/jemalloc/install/host/linux64-x86_64-gnu/lib -ljemalloc
  CHPL_HOST_SYSTEM_LINK_ARGS: -lrt -ldl -lz -ltinfo -lm -lpthread
CHPL_HOST_ARCH: x86_64
CHPL_HOST_CPU: none
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
  CHPL_TARGET_CC: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/clang --gcc-toolchain=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/gcc-10.4.0-oo2korcm4gxzmlowrmwl4fp2rtq4dsv5
  CHPL_TARGET_CXX: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/clang++ --gcc-toolchain=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/gcc-10.4.0-oo2korcm4gxzmlowrmwl4fp2rtq4dsv5
  CHPL_TARGET_COMPILER_PRGENV: none
  CHPL_TARGET_BUNDLED_COMPILE_ARGS: -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/localeModels/gpu -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/localeModels -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/comm/none -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/comm -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/tasks/qthreads -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/qio -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/atomics/cstdlib -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/mem/jemalloc -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/utf8-decoder -DHAS_GPU_LOCALE -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/runtime/include/gpu/amd -DCHPL_JEMALLOC_PREFIX=chpl_je_ -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/gmp/install/linux64-x86_64-native-llvm-none/include -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/include -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/include -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/jemalloc/install/target/linux64-x86_64-native-llvm-none/include -I/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/re2/install/linux64-x86_64-native-llvm-none/include
  CHPL_TARGET_SYSTEM_COMPILE_ARGS: -isystem/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/hip/include -isystem/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/hsa/include
  CHPL_TARGET_LD: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/clang++ --gcc-toolchain=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/gcc-10.4.0-oo2korcm4gxzmlowrmwl4fp2rtq4dsv5
  CHPL_TARGET_BUNDLED_LINK_ARGS: -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/lib/linux64/llvm/x86_64/cpu-native/loc-gpu/gpu-amd/gpu_mem-array_on_device/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/hwloc-bundled/re2-bundled/fs-none/lib_pic-none/san-none -lchpl -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/gmp/install/linux64-x86_64-native-llvm-none/lib -lgmp -Wl,-rpath,/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/gmp/install/linux64-x86_64-native-llvm-none/lib -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/lib -lhwloc -Wl,-rpath,/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/lib -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/lib -Wl,-rpath,/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/lib -lqthread -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/lib -lchpl -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/jemalloc/install/target/linux64-x86_64-native-llvm-none/lib -ljemalloc -L/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/re2/install/linux64-x86_64-native-llvm-none/lib -lre2 -Wl,-rpath,/home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/re2/install/linux64-x86_64-native-llvm-none/lib
  CHPL_TARGET_SYSTEM_LINK_ARGS: -L/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/lib -Wl,-rpath,/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/lib -lamdhip64 -lhsa-runtime64 -lm -lpthread
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_RUNTIME_CPU: native
CHPL_TARGET_CPU_FLAG: arch
CHPL_TARGET_BACKEND_CPU: native
CHPL_LOCALE_MODEL: gpu *
  CHPL_GPU: amd *
  CHPL_GPU_ARCH: gfx906 *
  CHPL_GPU_MEM_STRATEGY: array_on_device *
  CHPL_ROCM_PATH: /grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_HOST_MEM: jemalloc
  CHPL_HOST_JEMALLOC: bundled
CHPL_MEM: jemalloc
CHPL_TARGET_MEM: jemalloc
  CHPL_TARGET_JEMALLOC: bundled
CHPL_MAKE: gmake
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
  CHPL_GMP_IS_OVERRIDDEN: False
CHPL_HWLOC: bundled
CHPL_RE2: bundled
  CHPL_RE2_IS_OVERRIDDEN: False
CHPL_LLVM: bundled *
  CHPL_LLVM_SUPPORT: bundled
  CHPL_LLVM_CONFIG: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/llvm-config
  CHPL_LLVM_VERSION: 15
  CHPL_LLVM_CLANG_C: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/clang --gcc-toolchain=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/gcc-10.4.0-oo2korcm4gxzmlowrmwl4fp2rtq4dsv5
  CHPL_LLVM_CLANG_CXX: /home/ghelbecque/chapel-1.32.0MCG_array_on_device/third-party/llvm/install/linux64-x86_64/bin/clang++ --gcc-toolchain=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/gcc-10.4.0-oo2korcm4gxzmlowrmwl4fp2rtq4dsv5
  CHPL_LLVM_STATIC_DYNAMIC: static
  CHPL_LLVM_TARGET_CPU: native
CHPL_AUX_FILESYS: none
CHPL_LIB_PIC: none
CHPL_SANITIZE: none
CHPL_SANITIZE_EXE: none
CHPL_RUNTIME_SUBDIR: linux64/llvm/x86_64/cpu-native/loc-gpu/gpu-amd/gpu_mem-array_on_device/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/hwloc-bundled/re2-bundled/fs-none/lib_pic-none/san-none
CHPL_LAUNCHER_SUBDIR: linux64/gnu/x86_64/loc-gpu/comm-none/tasks-qthreads/launch-none/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/lib_pic-none/san-none
CHPL_COMPILER_SUBDIR: linux64/gnu/x86_64/hostmem-jemalloc/llvm-bundled/15/san-none
CHPL_HOST_BIN_SUBDIR: linux64-x86_64
CHPL_TARGET_BIN_SUBDIR: linux64-x86_64-native
CHPL_SYS_MODULES_SUBDIR: linux64-x86_64-llvm
  CHPL_LLVM_UNIQ_CFG_PATH: linux64-x86_64
  CHPL_GASNET_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none/substrate-none/seg-none
  CHPL_GMP_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none
  CHPL_HWLOC_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none-gpu
  CHPL_HOST_JEMALLOC_UNIQ_CFG_PATH: host/linux64-x86_64-gnu
  CHPL_TARGET_JEMALLOC_UNIQ_CFG_PATH: target/linux64-x86_64-native-llvm-none
  CHPL_LIBFABRIC_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none
  CHPL_LIBUNWIND_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none
  CHPL_QTHREAD_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled
  CHPL_RE2_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none
  CHPL_PE_CHPL_PKGCONFIG_LIBS:
Guillaume-Helbecque commented 1 year ago

Note that the output is not fully anonymize, as reported in #23603.

e-kayrakli commented 1 year ago

Could you try with:

export CHPL_ROCM_PATH=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl

CHPL_ROCM_PATH seems wrong. It should have the last bin/hipcc part peeled off. I'll take a look why that may not have been the case.

For more context: we fiddled with our heuristics quite a bit in the recent past to find the path for the ROCm installation. I don't think there's a good, portable and reliable way of doing that. I think what we should do next is to give a configuration time warning when the deduced CHPL_ROCM_PATH/CHPL_CUDA_PATH are wrong.

e-kayrakli commented 1 year ago

It looks like spack doesn't have a full ROCm package, but separate packages like hip, hipblas, hipcub. So far, we have worked with the assumption that the system has full ROCm installation. In the near future, we'll start using things like hipCUB (we are already using CUDA's CUB) and hipBLAS.

The homework for us here is to understand the spack packages around AMD (and NVIDIA) software to make sure that we can use them as needed and/or document the dependencies clearly in our technotes.

Nuts-and-bolts reason for the failure you see is that our current heuristic runs which hipcc and walks up the path until it finds something that starts with rocm. We can modify it to look for hip, too. But I have worries about not know what the hip installation looks like, and for the future how we can cover for things like hipCUB. So, if setting CHPL_ROCM_PATH like above works for you, I'll probably plan on us installing a spack version of hip on a system and experiment more with it before fiddling more with our heuristics.

Guillaume-Helbecque commented 1 year ago

Could you try with:

export CHPL_ROCM_PATH=/grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl

Still the same error.

I don't know if it helps, but I note that (un)loading the HIP module automatically changes CHPL_ROCM_PATH:

>>> $CHPL_HOME/util/printchplenv --anonymize --internal --all
...
CHPL_ROCM_PATH: /opt/rocm-4.5.0
...
>>> module load hip/5.2.0_gcc-10.4.0
>>> $CHPL_HOME/util/printchplenv --anonymize --internal --all
...
CHPL_ROCM_PATH: /grid5000/spack/v1/opt/spack/linux-debian11-x86_64_v2/gcc-10.4.0/hip-5.2.0-ctcbkptsxpka27keqds4u3ubsool77sl/bin/hipcc/
...
>>> module unload hip/5.2.0_gcc-10.4.0
>>> $CHPL_HOME/util/printchplenv --anonymize --internal --all
...
CHPL_ROCM_PATH: /opt/rocm-4.5.0
...
e-kayrakli commented 1 year ago

Just to report from a quick call with Guillaume, the spack-based installation of hip doesn't come with bitcode libraries. That throws the compiler off. The most immediate steps for us is to provide better error messages and ideally validation of the SDK installation in build time. Then we should explore spack packages and see if there's a spack-based solution for users and document the limitations in our technote.

lydia-duncan commented 2 weeks ago

In case it went by too fast, I wanted to point out that this got resolved by @jabraham17's #26072. Thanks for fixing this, Jade, and thanks again for reporting @Guillaume-Helbecque!