Closed joehellmers closed 3 years ago
Can you please share
1) The make
line you are using
2) The version of gcc that is in your environment
3) The full stdout log of make
I just use a simple make command with the following GNUMakefile
PRECISION = DOUBLE PROFILE = FALSE
DEBUG = FALSE
DIM = 2
COMP = pgi
USE_MPI = FALSE USE_OMP = FALSE USE_GPU = TRUE USE_MHD = FALSE
USE_FORT_MICROPHYSICS := FALSE BL_NO_FORT := TRUE
CASTRO_HOME := ../../..
EOS_DIR := gamma_law
NETWORK_DIR := general_null NETWORK_INPUTS = gammalaw.net
Bpack := ./Make.package Blocs := .
include $(CASTRO_HOME)/Exec/Make.Castro
g++ is version 9.3.0-17 (on Ubuntu 20.04) make.log
We don't have USE_GPU
set up to do the right thing at the moment. Can you try using USE_CUDA = TRUE
instead please?
That was the problem. Thanks.
When I try to run ./Castro2d.pgi.CUDA.ex inputs.2d.cyl_in_cartcoords
I get
Initializing CUDA... CUDA initialized with 1 GPU amrex::Abort::0::CUDA error 801 in file ../../../external/amrex/Src/Base/AMReX_PArena.cpp line 15: operation not supported !!! SIGABRT See Backtrace.0 file for details I do I need HYPRE setup? Or perhaps I need something else setup?
Backtrace.0 doesn't seem to show anything useful. Backtrace.0.log
You're hitting an issue in AMReX that should probably be fixed (the backtrace actually did help me figure that out, so thanks for sharing that). In AMReX some functionality from CUDA >= 11.2 is used (asynchronous memory allocator and memory pool support) unconditionally but there are some platforms where this is unsupported. (Which GPU are you using?) We should probably guard against that using the device attribute cudaDevAttrMemoryPoolsSupported
. cc @WeiqunZhang
Until we can fix that, a workaround would be to compile with an earlier version of CUDA. You could select a multi-CUDA installation of NVHPC to do so, if needed, and use one of the older CUDA toolkits. Sorry about that!
@joehellmers Could you give https://github.com/AMReX-Codes/amrex/pull/2221 a try to see it fixes your issue?
@WeiqunZhang , I was able to get past that problem. Now when I try to run I get the message.
Initializing CUDA...
CUDA initialized with 1 GPU
AMReX (21.08-dirty) initialized
Starting run at 21:22:02 UTC on 2021-08-08.
Successfully read inputs file ...
Castro git describe: 21.08-dirty
AMReX git describe: 21.08-dirty
Microphysics git describe: 21.08
reading extern runtime parameters ...
1 Species:
X
Successfully read inputs file ...
Initializing the data at level 0
Bus error (core dumped)
My GPU is a NVIDIA "clone" from Zotac, I think.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. GM107 [GeForce GTX 750]
Flags: bus master, fast devsel, latency 0, IRQ 42
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
It seems to be operational. From nvidia-smi I get
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 28% 40C P8 1W / 38W | 1MiB / 979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And I'm able to run a few of the CUDA test programs.
We don't support GPUs of that generation in Castro. In general we use CUDA Unified Memory functionality that is only available on compute capability >= 6.x GPUs (P100/GTX 10xx and later). I took a look at the usage that is causing the problem you are seeing and it would be too much effort to change it, sorry.
There is a workaround though, you could set the environment variable CUDA_LAUNCH_BLOCKING=1
. That worked OK for me on a K40 to run Sedov. It does limit performance though, especially for smaller problems.
I'll give that a shot @maxpkatz.
For now I'll I'm trying to do is get some performance characteristics of CASTRO for a startup allocation I'm submitting for XSEDE, probably on Expanse. One I get that I'm sure they have very nice GPUs for me to use.
I'm looking to do some simulations of 3-D SN Ia with some high density nuclear reactions taken into consideration.
Sounds good. Let us know if you need any of our previous performance measurements we used for our own proposals or to collect new ones for you on more recent GPUs -- GTX 750 won't show off the performance of Castro well (even relative to its architectural age) because the consumer-grade GPUs don't have much double precision support.
has this been resolved?
@zingale, yes, setting that environment variable allows me to run the simulation using my GPU.
@maxpkatz, If you have some performance data that I can use that would be great!
okay, great. I'm going to close this issue then.
I'm using the PGI/NVIDIA compiler version 21.3 with CUDA 11.2. I'm getting messages ../../../external/amrex/Src/Base/AMReX_IntVect.H, line 559: error: expected a ; AMREX_FORCE_INLINE ^
and
../../../external/amrex/Src/Base/AMReX_Math.H", line 27: error: variable "amrex::disabled::device" has already been defined AMREX_GPU_HOST_DEVICE long long abs (long long); ^
For the compiler I'm sourcing the following export NVARCH=
uname -s
_uname -m
export NVCOMPILERS=/opt/nvidia/hpc_sdk export MANPATH="$MANPATH":$NVCOMPILERS/$NVARCH/21.3/compilers/man export PATH=$NVCOMPILERS/$NVARCH/21.3/compilers/bin:$PATHIs there something else I need to set?