AMReX-Astro / Castro

Castro (Compressible Astrophysics): An adaptive mesh, astrophysical compressible (radiation-, magneto-) hydrodynamics simulation code for massively parallel CPU and GPU architectures.
http://amrex-astro.github.io/Castro
Other
299 stars 97 forks source link

Problem Building Sedov With CUDA #1955

Closed joehellmers closed 3 years ago

joehellmers commented 3 years ago

I'm using the PGI/NVIDIA compiler version 21.3 with CUDA 11.2. I'm getting messages ../../../external/amrex/Src/Base/AMReX_IntVect.H, line 559: error: expected a ; AMREX_FORCE_INLINE ^

and

../../../external/amrex/Src/Base/AMReX_Math.H", line 27: error: variable "amrex::disabled::device" has already been defined AMREX_GPU_HOST_DEVICE long long abs (long long); ^

For the compiler I'm sourcing the following export NVARCH=uname -s_uname -m export NVCOMPILERS=/opt/nvidia/hpc_sdk export MANPATH="$MANPATH":$NVCOMPILERS/$NVARCH/21.3/compilers/man export PATH=$NVCOMPILERS/$NVARCH/21.3/compilers/bin:$PATH

Is there something else I need to set?

maxpkatz commented 3 years ago

Can you please share

1) The make line you are using 2) The version of gcc that is in your environment 3) The full stdout log of make

joehellmers commented 3 years ago

I just use a simple make command with the following GNUMakefile

PRECISION = DOUBLE PROFILE = FALSE

DEBUG = FALSE

DIM = 2

COMP = pgi

USE_MPI = FALSE USE_OMP = FALSE USE_GPU = TRUE USE_MHD = FALSE

USE_FORT_MICROPHYSICS := FALSE BL_NO_FORT := TRUE

CASTRO_HOME := ../../..

EOS_DIR := gamma_law

NETWORK_DIR := general_null NETWORK_INPUTS = gammalaw.net

Bpack := ./Make.package Blocs := .

include $(CASTRO_HOME)/Exec/Make.Castro

g++ is version 9.3.0-17 (on Ubuntu 20.04) make.log

maxpkatz commented 3 years ago

We don't have USE_GPU set up to do the right thing at the moment. Can you try using USE_CUDA = TRUE instead please?

joehellmers commented 3 years ago

That was the problem. Thanks.

When I try to run ./Castro2d.pgi.CUDA.ex inputs.2d.cyl_in_cartcoords

I get

Initializing CUDA... CUDA initialized with 1 GPU amrex::Abort::0::CUDA error 801 in file ../../../external/amrex/Src/Base/AMReX_PArena.cpp line 15: operation not supported !!! SIGABRT See Backtrace.0 file for details I do I need HYPRE setup? Or perhaps I need something else setup?

Backtrace.0 doesn't seem to show anything useful. Backtrace.0.log

maxpkatz commented 3 years ago

You're hitting an issue in AMReX that should probably be fixed (the backtrace actually did help me figure that out, so thanks for sharing that). In AMReX some functionality from CUDA >= 11.2 is used (asynchronous memory allocator and memory pool support) unconditionally but there are some platforms where this is unsupported. (Which GPU are you using?) We should probably guard against that using the device attribute cudaDevAttrMemoryPoolsSupported. cc @WeiqunZhang

Until we can fix that, a workaround would be to compile with an earlier version of CUDA. You could select a multi-CUDA installation of NVHPC to do so, if needed, and use one of the older CUDA toolkits. Sorry about that!

WeiqunZhang commented 3 years ago

@joehellmers Could you give https://github.com/AMReX-Codes/amrex/pull/2221 a try to see it fixes your issue?

joehellmers commented 3 years ago

@WeiqunZhang , I was able to get past that problem. Now when I try to run I get the message.

Initializing CUDA...
CUDA initialized with 1 GPU
AMReX (21.08-dirty) initialized

Starting run at 21:22:02 UTC on 2021-08-08.
Successfully read inputs file ...

Castro git describe: 21.08-dirty
AMReX git describe: 21.08-dirty
Microphysics git describe: 21.08

reading extern runtime parameters ...
1 Species:
X
Successfully read inputs file ...
Initializing the data at level 0
Bus error (core dumped)

My GPU is a NVIDIA "clone" from Zotac, I think.

01:00.0 VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. GM107 [GeForce GTX 750]
        Flags: bus master, fast devsel, latency 0, IRQ 42
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

It seems to be operational. From nvidia-smi I get

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 28%   40C    P8     1W /  38W |      1MiB /   979MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And I'm able to run a few of the CUDA test programs.

maxpkatz commented 3 years ago

We don't support GPUs of that generation in Castro. In general we use CUDA Unified Memory functionality that is only available on compute capability >= 6.x GPUs (P100/GTX 10xx and later). I took a look at the usage that is causing the problem you are seeing and it would be too much effort to change it, sorry.

maxpkatz commented 3 years ago

There is a workaround though, you could set the environment variable CUDA_LAUNCH_BLOCKING=1. That worked OK for me on a K40 to run Sedov. It does limit performance though, especially for smaller problems.

joehellmers commented 3 years ago

I'll give that a shot @maxpkatz.

For now I'll I'm trying to do is get some performance characteristics of CASTRO for a startup allocation I'm submitting for XSEDE, probably on Expanse. One I get that I'm sure they have very nice GPUs for me to use.

I'm looking to do some simulations of 3-D SN Ia with some high density nuclear reactions taken into consideration.

maxpkatz commented 3 years ago

Sounds good. Let us know if you need any of our previous performance measurements we used for our own proposals or to collect new ones for you on more recent GPUs -- GTX 750 won't show off the performance of Castro well (even relative to its architectural age) because the consumer-grade GPUs don't have much double precision support.

zingale commented 3 years ago

has this been resolved?

joehellmers commented 3 years ago

@zingale, yes, setting that environment variable allows me to run the simulation using my GPU.

@maxpkatz, If you have some performance data that I can use that would be great!

zingale commented 3 years ago

okay, great. I'm going to close this issue then.