Closed zingale closed 2 months ago
I can confirm that I run on Frontier if I don't include #3881
Does it run if you comment out this line? https://github.com/AMReX-Codes/amrex/blob/64d2360b209c1e625bfc9a6feeb7501a618f7ed5/Src/Base/AMReX_GpuDevice.cpp#L326
If that is the case, it looks like a compiler bug. We don't actually need that call. It was there so that the compiler does not warn about amrex_check_wavefront_size being an unused function. We could probably fix it by
diff --git a/Src/Base/AMReX_GpuDevice.cpp b/Src/Base/AMReX_GpuDevice.cpp
index 6f972040e1..e129068d4c 100644
--- a/Src/Base/AMReX_GpuDevice.cpp
+++ b/Src/Base/AMReX_GpuDevice.cpp
@@ -323,7 +323,9 @@ Device::Initialize ()
#endif
#if defined(AMREX_USE_HIP)
- amrex::single_task(amrex_check_wavefront_size);
+ if (num_devices_used < 0) {
+ amrex::single_task(amrex_check_wavefront_size);
+ }
#endif
Device::profilerStart();
@zingale If this works, could you submit a PR so that we can get it fixed soon?
We will also need #3897 merged first to pass CI.
okay, indeed, commenting out that line fixes the issue
okay, it works with your suggested fix. Thanks @WeiqunZhang . PR issued
When running Castro on Frontier with HIP with the latest AMReX, I get:
I've I drop back to 24.04, then things work fine.
Looking at the recent changes, I suspect #3881 to be the bug. I'll try bisecting