Closed wjhorne closed 1 year ago
@wjhorne -- thanks for filing this bug report. It looks like something in AMD binary generation is going wrong, probably because something in your system's installation is not quite what our scripts expect to see. But I can't pinpoint without further information. Could you:
printchplenv --all --internal
and paste the resultwhich hipcc
and paste the result--devel
and paste the error message you get from thatls $CHPL_ROCM_PATH/amdgcn/bitcode/*bc
. Note that $CHPL_ROCM_PATH
will not be set in the environment, but you'll see that in the output of (1).I am also tagging @stonea as he knows AMD quirks much better than myself.
printchplenv machine info: Linux chameleon 6.4.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 11 Jul 2023 05:13:39 +0000 x86_64 CHPL_HOME: /home/nix/Code/languages/chapel script location: /home/nix/Code/languages/chapel/util/chplenv CHPL_HOST_PLATFORM: linux64 CHPL_HOST_COMPILER: gnu CHPL_HOST_CC: gcc CHPL_HOST_CXX: g++ CHPL_HOST_BUNDLED_COMPILE_ARGS: -I/home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/include -std=c++14 -fno-exceptions -fno-rtti -D_GNU_SOURCE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Wno-comment -DHAVE_LLVM -I/home/nix/Code/languages/chapel/third-party/jemalloc/install/host/linux64-x86_64-gnu/include CHPL_HOST_SYSTEM_COMPILE_ARGS: CHPL_HOST_BUNDLED_LINK_ARGS: -L/home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/lib -Wl,-rpath,/home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/lib -lclangFrontend -lclangSerialization -lclangDriver -lclangCodeGen -lclangParse -lclangSema -lclangAnalysis -lclangEdit -lclangASTMatchers -lclangAST -lclangLex -lclangBasic -lclangSupport -L/home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/lib -lLLVM-15 -L/home/nix/Code/languages/chapel/third-party/jemalloc/install/host/linux64-x86_64-gnu/lib -ljemalloc CHPL_HOST_SYSTEM_LINK_ARGS: -lm -lpthread CHPL_HOST_ARCH: x86_64 CHPL_HOST_CPU: none CHPL_TARGET_PLATFORM: linux64 CHPL_TARGET_COMPILER: llvm + CHPL_TARGET_CC: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/clang CHPL_TARGET_CXX: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/clang++ CHPL_TARGET_COMPILER_PRGENV: none CHPL_TARGET_BUNDLED_COMPILE_ARGS: -I/home/nix/Code/languages/chapel/runtime/include/localeModels/gpu -I/home/nix/Code/languages/chapel/runtime/include/localeModels -I/home/nix/Code/languages/chapel/runtime/include/comm/none -I/home/nix/Code/languages/chapel/runtime/include/comm -I/home/nix/Code/languages/chapel/runtime/include/tasks/qthreads -I/home/nix/Code/languages/chapel/runtime/include -I/home/nix/Code/languages/chapel/runtime/include/qio -I/home/nix/Code/languages/chapel/runtime/include/atomics/cstdlib -I/home/nix/Code/languages/chapel/runtime/include/mem/jemalloc -I/home/nix/Code/languages/chapel/third-party/utf8-decoder -DHAS_GPU_LOCALE -I/home/nix/Code/languages/chapel/runtime/include/gpu/amd -DCHPL_JEMALLOC_PREFIX=chplje -I/home/nix/Code/languages/chapel/third-party/gmp/install/linux64-x86_64-native-llvm-none/include -I/home/nix/Code/languages/chapel/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/include -I/home/nix/Code/languages/chapel/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/include -I/home/nix/Code/languages/chapel/third-party/jemalloc/install/target/linux64-x86_64-native-llvm-none/include -I/home/nix/Code/languages/chapel/third-party/re2/install/linux64-x86_64-native-llvm-none/include CHPL_TARGET_SYSTEM_COMPILE_ARGS: -isystem/opt/hip/include -isystem/opt/hsa/include CHPL_TARGET_LD: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/clang++ CHPL_TARGET_BUNDLED_LINK_ARGS: -L/home/nix/Code/languages/chapel/lib/linux64/llvm/x86_64/cpu-native/loc-gpu/gpu-amd/gpu_mem-unified_memory/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/hwloc-bundled/re2-bundled/fs-none/lib_pic-none/san-none -lchpl -L/home/nix/Code/languages/chapel/third-party/gmp/install/linux64-x86_64-native-llvm-none/lib -lgmp -Wl,-rpath,/home/nix/Code/languages/chapel/third-party/gmp/install/linux64-x86_64-native-llvm-none/lib -L/home/nix/Code/languages/chapel/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/lib -lhwloc -Wl,-rpath,/home/nix/Code/languages/chapel/third-party/hwloc/install/linux64-x86_64-native-llvm-none-gpu/lib -L/home/nix/Code/languages/chapel/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/lib -Wl,-rpath,/home/nix/Code/languages/chapel/third-party/qthread/install/linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled/lib -lqthread -lchpl -L/home/nix/Code/languages/chapel/third-party/jemalloc/install/target/linux64-x86_64-native-llvm-none/lib -ljemalloc -L/home/nix/Code/languages/chapel/third-party/re2/install/linux64-x86_64-native-llvm-none/lib -lre2 -Wl,-rpath,/home/nix/Code/languages/chapel/third-party/re2/install/linux64-x86_64-native-llvm-none/lib CHPL_TARGET_SYSTEM_LINK_ARGS: -L/opt/lib -Wl,-rpath,/opt/lib -lamdhip64 -lhsa-runtime64 -lnuma -lm -lpthread CHPL_TARGET_ARCH: x86_64 CHPL_TARGET_CPU: native + CHPL_RUNTIME_CPU: native CHPL_TARGET_CPU_FLAG: arch CHPL_TARGET_BACKEND_CPU: native CHPL_LOCALE_MODEL: gpu + CHPL_GPU: amd + CHPL_GPU_ARCH: gfx1035 CHPL_GPU_MEM_STRATEGY: unified_memory CHPL_ROCM_PATH: /opt CHPL_COMM: none CHPL_TASKS: qthreads CHPL_LAUNCHER: none CHPL_TIMERS: generic CHPL_UNWIND: none CHPL_HOST_MEM: jemalloc CHPL_HOST_JEMALLOC: bundled CHPL_MEM: jemalloc CHPL_TARGET_MEM: jemalloc CHPL_TARGET_JEMALLOC: bundled CHPL_MAKE: make CHPL_ATOMICS: cstdlib CHPL_GMP: bundled CHPL_GMP_IS_OVERRIDDEN: False CHPL_HWLOC: bundled CHPL_RE2: bundled CHPL_RE2_IS_OVERRIDDEN: False CHPL_LLVM: bundled + CHPL_LLVM_SUPPORT: bundled CHPL_LLVM_CONFIG: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/llvm-config CHPL_LLVM_VERSION: 15 CHPL_LLVM_CLANG_C: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/clang CHPL_LLVM_CLANG_CXX: /home/nix/Code/languages/chapel/third-party/llvm/install/linux64-x86_64/bin/clang++ CHPL_LLVM_STATIC_DYNAMIC: static CHPL_LLVM_TARGET_CPU: native CHPL_AUX_FILESYS: none CHPL_LIB_PIC: none CHPL_SANITIZE: none CHPL_SANITIZE_EXE: none CHPL_RUNTIME_SUBDIR: linux64/llvm/x86_64/cpu-native/loc-gpu/gpu-amd/gpu_mem-unified_memory/comm-none/tasks-qthreads/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/hwloc-bundled/re2-bundled/fs-none/lib_pic-none/san-none CHPL_LAUNCHER_SUBDIR: linux64/gnu/x86_64/loc-gpu/comm-none/tasks-qthreads/launch-none/tmr-generic/unwind-none/mem-jemalloc/atomics-cstdlib/lib_pic-none/san-none CHPL_COMPILER_SUBDIR: linux64/gnu/x86_64/hostmem-jemalloc/llvm-bundled/15/san-none CHPL_HOST_BIN_SUBDIR: linux64-x86_64 CHPL_TARGET_BIN_SUBDIR: linux64-x86_64-native CHPL_SYS_MODULES_SUBDIR: linux64-x86_64-llvm CHPL_LLVM_UNIQ_CFG_PATH: linux64-x86_64 CHPL_GASNET_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none/substrate-none/seg-none CHPL_GMP_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none CHPL_HWLOC_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none-gpu CHPL_HOST_JEMALLOC_UNIQ_CFG_PATH: host/linux64-x86_64-gnu CHPL_TARGET_JEMALLOC_UNIQ_CFG_PATH: target/linux64-x86_64-native-llvm-none CHPL_LIBFABRIC_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none CHPL_LIBUNWIND_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none CHPL_QTHREAD_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none-gpu-jemalloc-bundled CHPL_RE2_UNIQ_CFG_PATH: linux64-x86_64-native-llvm-none CHPL_PE_CHPL_PKGCONFIG_LIBS:
which /opt/rocm/bin/hipcc
devel flag internal error: seg fault [util/misc.cpp:1041] rocm path CHPL_ROCM_PATH = /opt <---- This seems wrong, I think it should be /opt/rocm
output of ls /opt/rocm/amdgcn/bitcode/bc /opt/rocm/amdgcn/bitcode/asanrtl.bc /opt/rocm/amdgcn/bitcode/hip.bc /opt/rocm/amdgcn/bitcode/ockl.bc /opt/rocm/amdgcn/bitcode/oclc_abi_version_400.bc /opt/rocm/amdgcn/bitcode/oclc_abi_version_500.bc /opt/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc /opt/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_on.bc /opt/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc /opt/rocm/amdgcn/bitcode/oclc_daz_opt_on.bc /opt/rocm/amdgcn/bitcode/oclc_finite_only_off.bc /opt/rocm/amdgcn/bitcode/oclc_finite_only_on.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1010.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1011.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1012.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1013.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1030.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1031.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1032.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1033.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1034.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1035.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1036.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1100.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1101.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1102.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_1103.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_600.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_601.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_602.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_700.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_701.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_702.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_703.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_704.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_705.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_801.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_802.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_803.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_805.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_810.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_900.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_902.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_904.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_906.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_908.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_909.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_90a.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_90c.bc /opt/rocm/amdgcn/bitcode/oclc_isa_version_940.bc /opt/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc /opt/rocm/amdgcn/bitcode/oclc_unsafe_math_on.bc /opt/rocm/amdgcn/bitcode/oclc_wavefrontsize64_off.bc /opt/rocm/amdgcn/bitcode/oclc_wavefrontsize64_on.bc /opt/rocm/amdgcn/bitcode/ocml.bc /opt/rocm/amdgcn/bitcode/opencl.bc
OK this is unfortunate and your assessment is correct -- CHPL_ROCM_PATH
is clearly wrong there.
Luckily, you should be able to set it manually via export CHPL_ROCM_PATH=/opt/rocm
and you should be good to go. This is documented under https://chapel-lang.org/docs/main/technotes/gpu.html#vendor-portability, but if you have any suggestions on improving that we'd appreciate it.
My theory about the problem is that the path you get from which hipcc
is a symlink that eventually points at something more complicated. realpath $(which hipcc)
works on my system and follows symlinks to the end. If realpath
doesn't work, you can use ls -l
to follow the links manually. I'd be interested in seeing that path.
Our current heuristic is that we peel off 3 parts of the path from the real path of hipcc
to find the rocm root. Maybe your hipcc is not that deep for some reason? This heuristic worked on several machines we tested so far, but apparently not very universal.
One of the recent discussions we've been having was about including CHPL_ROCM_PATH
/CHPL_CUDA_PATH
in printchplenv
output without any flags. It might make this issue more visible. We could probably improve our heuristics to find ROCm path correctly in your case, but there'll always be another system with a unique installation. It is to our benefit to expose these knobs to the users.
I think setting the path is simple enough once I knew that I had to export it on compilation, not just building. Unfortunately it looks like that helped me get past one error into another. I now am getting
lld: error: undefined symbol: __oclc_ABI_version
referenced by /tmp/chpl-nix.deleteme-t9hXMj/chplgpu.o:(ockl_hostcall_preview) referenced by /tmp/chpl-nix.deleteme-t9hXMj/chplgpu.o:(ockl_hostcall_preview)
It looks like it might be an issue with clang/hipcc itself. I found one comment in a different repo recommending to add "-Xclang -mlink-bitcode-file -Xclang /rocm/install/path/amdgcn/bitcode/oclc_abi_version_400.bc" to the clang calls. Is there a way I can test this out during compiling of Chapel?
That's interesting, we don't link to that bc library indeed.
Could you try passing --ccflags "-Xclang -mlink-bitcode-file -Xclang /rocm/install/path/amdgcn/bitcode/oclc_abi_version_400.bc"
. If quotes don't work, you should be able to pass every "word" preceded by --ccflags
individually. This should normally pass flags to Clang directly.
There's a special handling in our compiler for bitcode library linking, and I am unsure whether --ccflags
could do something equivalent. If it doesn't work, could you try patching your Chapel with:
diff --git a/compiler/llvm/clangUtil.cpp b/compiler/llvm/clangUtil.cpp
index fa0bef8b43..92bcc12950 100644
--- a/compiler/llvm/clangUtil.cpp
+++ b/compiler/llvm/clangUtil.cpp
@@ -4259,6 +4259,7 @@ static void linkGpuDeviceLibraries() {
linkBitCodeFile((libPath + "/oclc_finite_only_off.bc").c_str());
linkBitCodeFile((libPath + "/oclc_correctly_rounded_sqrt_on.bc").c_str());
linkBitCodeFile((libPath + "/oclc_wavefrontsize64_on.bc").c_str());
+ linkBitCodeFile((libPath + "/oclc_abi_version_400.bc").c_str());
linkBitCodeFile(determineOclcVersionLib(libPath).c_str());
}
and rebuild the compiler?
@stonea -- do you know what kind of bitcode libraries end up in that path? Should we just link them all (especially so, if this solution works)? At the very least we should check for some other files, maybe.
With your patch I was able to pass the error and I am now to an error that reads the following
internal error: gpu-amd.c:62: Error calling HIP function: no kernel image is available for execution on the device (Code: 209)
From what I know this usually means that the correct architecture isn't being targeted. For my system I need something akin to "--offload-arch=gfx1032,gfx1035" due to the presence of a discrete GPU (gfx1035) and one attached to the processor (gfx1032). When I attempt to do CHPL_GPU_ARCH=gfx1032,gfx1035 I get
/opt/rocm/llvm/bin/clang-offload-bundler: warning: -inputs is deprecated, use -input instead /opt/rocm/llvm/bin/clang-offload-bundler: warning: -outputs is deprecated, use -output instead /opt/rocm/llvm/bin/clang-offload-bundler: error: number of input files and targets should match in bundling mode error: .out file to fatbin file
Targeting gfx1032 or gfx1035 produces the first error.
@stonea -- do you know what kind of bitcode libraries end up in that path? Should we just link them all (especially so, if this solution works)? At the very least we should check for some other files, maybe.
'ocml.bc' and 'ockl.bc' are the main things. This is configured by linking to a number of other .bc files to turn various features on/off. Documented here:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/doc/OCML.md#controls
So we wouldn't want to link to all .bc
files in that directory (some of them have contradictory meanings).
The "abi version" one is new to me. The linkBitCodeFile(determineOclcVersionLib(libPath).c_str());
line should be linking to an oclc_isa_version_XYZ.amdgcn.bc
, but given the linker error it seems like we should be linking to one of them (just strange we haven't encountered this ourselves yet).
Edit: also note if you run hipcc
with -###
you can see what it's linking against.
Hmm, that's a setup that we haven't tested on before. I am guessing this is a personal system as it has both integrated and discrete GPUs?
The fact that multiple architectures doesn't work with CHPL_GPU_ARCH
isn't very surprising -- we don't handle that today, though we considered in the past from a portability standpoint.
Since you have already patched your Chapel, I am going to suggest what my next step would be here :). I am curious what your rocm-smi --showproductname
shows. My guess is that the integrated GPU is device 0 and the discrete is device 1. Assuming that is the case, here's a hack to make the runtime ignore device 0:
diff --git a/runtime/src/gpu/amd/gpu-amd.c b/runtime/src/gpu/amd/gpu-amd.c
index 6f6afe5403..515a9ba8d4 100644
--- a/runtime/src/gpu/amd/gpu-amd.c
+++ b/runtime/src/gpu/amd/gpu-amd.c
@@ -138,7 +138,7 @@ void chpl_gpu_impl_init(int* num_devices) {
deviceClockRates = chpl_malloc(sizeof(int)*loc_num_devices);
int i;
- for (i=0 ; i<loc_num_devices ; i++) {
+ for (i=1 ; i<loc_num_devices ; i++) {
hipDevice_t device;
hipCtx_t context;
Note that you'll need to rebuild your runtime.
The error you're getting occurs during runtime initialization where we iterate over devices and do necessary initialization, including loading the binary for the device into memory. In your case, there's no binary for device 0 (I presume). If this hack works, we can consider handling (i.e. ignoring for now) integrated GPUs in a nicer way via an environment variable of sorts.
Also, I'd be up for arranging a screenshare session to get to the bottom of the issues if this doesn't help and/or you'd find it helpful. You can reach out to me at engin@hpe.com
.
My integrated gpu is indeed in slot 0 and the additional patch got me one step closer.
I am currently stuck on another error given as
internal error: gpu-amd.c:72: Error calling HIP function: named symbol not found (Code: 500)
Using the jacobi.chpl test case and printing out the kernel name right before the failure yields chpl_gpu_kernel_jacobi_line_37.
My time is pretty sporadic on this so not sure if a screen share is going to work out soon, but it would be nice to get this worked out. My goal of all of this is to turn on an actual GPU on some code I worked on using CHPL_GPU=cpu before trying to move to actual clusters.
My time is pretty sporadic on this so not sure if a screen share is going to work out soon, but it would be nice to get this worked out. My goal of all of this is to turn on an actual GPU on some code I worked on using CHPL_GPU=cpu before trying to move to actual clusters.
No worries. Let me know if your plan changes. I think we'll incorporate what we learn from here into our code. The only problem is lack of a system where we can nightly-test these soon-to-be-features. Your path starting cpu-as-device mode makes sense. Just the intermediate step that you're wrestling with at the moment has different parameters than actual clusters. IOW, I certainly hope things will be smoother in your final target.
On to the problem; I think we are generating the kernel, but not setting up the "ignored GPU" correctly. So, when you do on here.gpus[0]
, you're still targeting the integrated chip. Here's a more advanced patch that's closer to feature than a hack. You'll need to set CHPL_RT_NUM_IGNORED_GPUS=1
when launching an application. It'll skip first 1 GPUs when initializing the runtime. It'll set number of devices correctly this time, though. As you can see the patch is large this time, and my confidence in it is low. If it doesn't work, I would consider using clusters directly, if I am being frank.
ignoregpus.patch -- you need to revert the previous runtime patch.
This passes in a quick test with writeln(here.gpus.size)
and I think it'll help in your case. Let me know how it goes.
I am happy to report that everything works if I use here.gpus[1] rather than allowing here.gpus[0] indicating that the problem is what you have indicated. I'll go through with the larger patch you have provided to produce something a bit easier to work with generally, but I think everything is solved here.
Phew, that's great to hear. I will summarize our observations and different issues we tackled before closing this issue. We can probably merge more improved version of the hacks here as well.
You may be doing this already, but the following can make your life easier when you move to clusters in case you have a ton of on here.gpus[N]
:
config const nIgnored = 0;
on here.gpus[N+nIgnored] { ... } // run on GPU N
coforall gpu in here.gpus[nIgnored..] do on gpu { ... } // run on all GPUs except
// first `nIgnored`
You can set --nIgnored=1
on your current system when running your application, and drop that argument when running on an actual cluster. Once you're done porting to the cluster, it should be relatively easy to rip out nIgnored
compared to hard-coding magic numbers in your code. (Or keep it in if that's helpful, obviously)
OK, I think I've distilled several issues that came up here.
@wjhorne am I missing something here? My intention is to close this issue as there's no further action that's needed here and it sprawled quite a bit. All of the above links to this one as I believe the context will be important going forward. Does that make sense?
The only thing I would add is that the current method that /util/chplenv/chpl_gpu.py uses to determine ROCm version has the same /opt/rocm issue I encountered even when I attempted to set CHPL_ROCM_PATH. I ended up hacking in the correct version to pass the issue, but ideally it would find /opt/rocm/.info/version correctly for cases like mine.
Thanks for all the effort and quick replies here. I am having a largely positive experience with Chapel so far and am glad to see the support side is so strong.
Thanks for saying so, Wyatt, and thanks for the quick responses and actions, Engin!
The only thing I would add is that the current method that /util/chplenv/chpl_gpu.py uses to determine ROCm version has the same /opt/rocm issue I encountered even when I attempted to set CHPL_ROCM_PATH. I ended up hacking in the correct version to pass the issue, but ideally it would find /opt/rocm/.info/version correctly for cases like mine.
Posted a comment about this here: https://github.com/chapel-lang/chapel/issues/22780#issuecomment-1644698169
Please feel free to subscribe to those issues or comment under them if I mischaracterized anything.
Thanks again for the bug report! Closing this issue.
Hi @wjhorne -- In case you missed it, we were able to solve 2 of the problems I listed above in version 1.32 (released last week).
The main blocker for you is the mix of integrated+discrete GPUs, which still remains unresolved. But I was wondering if you were able to make progress in your experiments and was able to run an HPC system where the problem hopefully won't arise.
Meanwhile, we also received requests that are not identical to your case, but probably would require a solution that can help with your case, too. I captured those issues and some ideas going forward in https://github.com/chapel-lang/chapel/issues/23535. Feel free to comment under it if you have any thoughts.
I have been watching as things have progressed and am glad that there has been so much progress! I was able to run on my discrete + integrated setup using the hacks that were discussed here. I was also able to run on a cluster, but ran into cluster teething issues that are not at all Chapel related.
I'll continue to watch as more changes come in. From my end, any work that makes the capability more portable between various machines, desktops/clusters, is greatly appreciated. It is one of the strong advantages of c++, along with a very healthy dose of inertia, right now when mixed with stuff like Kokkos or RAJA for HPC.
Summary of Problem
I currently get the following error when attempting to compile GPU programs using rocm on an AMD workstation
internal error: UTI-MIS-1041 chpl version 1.32.0 pre-release (048e735b27)
Compiling works using CHPL_GPU=cpu. I checked the LLVM clang install that is used and verified that it can compile and run HIP programs without issue.
Steps to Reproduce
Source Code: Any chapel source code regardless if any GPU related modules are used.
Compile command: CHPL_GPU=amd CHPL_GPU_ARCH=gfx1035 chpl jacobi.chpl
Configuration Information
Output of chpl --version
Output of printchplenv
clang --version