__ __
/ / ____ ____ / /_ ____ _____
/ / / __ `/ __ `/ __ \/ __ \/ ___/
/ /___/ /_/ / /_/ / / / / /_/ (__ )
/_____/\__,_/\__, /_/ /_/\____/____/
/____/
High-order Lagrangian Hydrodynamics Miniapp
Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping.
Laghos is based on the discretization method described in the following article:
V. Dobrev, Tz. Kolev and R. Rieben
High-order curvilinear finite element methods for Lagrangian hydrodynamics
SIAM Journal on Scientific Computing, (34) 2012, pp. B606–B641.
Laghos captures the basic structure of many compressible shock hydrocodes, including the BLAST code at Lawrence Livermore National Laboratory. The miniapp is built on top of a general discretization library, MFEM, thus separating the pointwise physics from finite element and meshing concerns.
The Laghos miniapp is part of the CEED software suite, a collection of software benchmarks, miniapps, libraries and APIs for efficient exascale discretizations based on high-order finite element and spectral element methods. See http://github.com/ceed for more information and source code availability.
The CEED research is supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.
The problem that Laghos is solving is formulated as a big (block) system of ordinary differential equations (ODEs) for the unknown (high-order) velocity, internal energy and mesh nodes (position). The left-hand side of this system of ODEs is controlled by mass matrices (one for velocity and one for energy), while the right-hand side is constructed from a force matrix.
Laghos supports two options for deriving and solving the ODE system, namely the full assembly and the partial assembly methods. Partial assembly is the main algorithm of interest for high orders. For low orders (e.g. 2nd order in 3D), both algorithms are of interest.
The full assembly option relies on constructing and utilizing global mass and force matrices stored in compressed sparse row (CSR) format. In contrast, the partial assembly option defines only the local action of those matrices, which is then used to perform all necessary operations. As the local action is defined by utilizing the tensor structure of the finite element spaces, the amount of data storage, memory transfers, and FLOPs are lower (especially for higher orders).
The Laghos implementation includes support for hardware devices, such
as GPUs, and programming models, such as CUDA, OCCA, RAJA and OpenMP,
based on MFEM, version 4.1 or later. These device
backends are selectable at runtime, see the -d/--device
command-line
option.
Other computational motives in Laghos include the following:
laghos.cpp
contains the main driver with the time integration loop
starting around line 609.LagrangianHydroOperator
, defined around line 544 of laghos.cpp
and implemented in files laghos_solver.hpp
and laghos_solver.cpp
.LagrangianHydroOperator::UpdateQuadratureData
in laghos_solver.cpp
.-pa
for partial assembly or -fa
for full
assembly), the function LagrangianHydroOperator::Mult
uses the corresponding
method to construct and solve the final ODE system.MassIntegrator
and VectorMassIntegrator
. Full
assembly of the ODE's right hand side is performed by utilizing the class
ForceIntegrator
defined in laghos_assembly.hpp
.ForcePAOperator
and MassPAOperator
defined in laghos_assembly.hpp
.Mult*
functions of the classes MassPAOperator
and ForcePAOperator
implemented in file laghos_assembly.cpp
. These functions have specific
versions for quadrilateral and hexahedral elements.-ok
and -ot
input parameters, respectively.Laghos has the following external dependencies:
hypre, used for parallel linear algebra, we recommend version 2.11.2
https://github.com/hypre-space/hypre/releases/tag/v2.11.2
METIS, used for parallel domain decomposition (optional), we recommend version 4.0.3
https://github.com/mfem/tpls
MFEM, used for (high-order) finite element discretization, its GitHub master branch
https://github.com/mfem/mfem
To build the miniapp, first download hypre and METIS from the links above
and put everything on the same level as the Laghos
directory:
~> ls
Laghos/ v2.11.2.tar.gz metis-4.0.3.tar.gz
Build hypre:
~> tar -zxvf v2.11.2.tar.gz
~> cd hypre-2.11.2/src/
~/hypre-2.11.2/src> ./configure --disable-fortran
~/hypre-2.11.2/src> make -j
~/hypre-2.11.2/src> cd ../..
~> ln -s hypre-2.11.2 hypre
For large runs (problem size above 2 billion unknowns), add the
--enable-bigint
option to the above configure
line.
Build METIS:
~> tar -zxvf metis-4.0.3.tar.gz
~> cd metis-4.0.3
~/metis-4.0.3> make
~/metis-4.0.3> cd ..
~> ln -s metis-4.0.3 metis-4.0
This build is optional, as MFEM can be build without METIS by specifying
MFEM_USE_METIS = NO
below.
Clone and build the parallel version of MFEM:
~> git clone https://github.com/mfem/mfem.git ./mfem
~> cd mfem/
~/mfem> git checkout master
~/mfem> make parallel -j
~/mfem> cd ..
The above uses the master
branch of MFEM.
See the MFEM building page for additional details.
(Optional) Clone and build GLVis:
~> git clone https://github.com/GLVis/glvis.git ./glvis
~> cd glvis/
~/glvis> make
~/glvis> cd ..
The easiest way to visualize Laghos results is to have GLVis running in a
separate terminal. Then the -vis
option in Laghos will stream results directly
to the GLVis socket.
Build Laghos
~> cd Laghos/
~/Laghos> make -j
This can be followed by make test
and make install
to check and install the
build respectively. See make help
for additional options.
See also the make setup
target that can be used to automated the
download and building of hypre, METIS and MFEM.
The main problem of interest for Laghos is the Sedov blast wave (-p 1
) with
partial assembly option (-pa
).
Some sample runs in 2D and 3D respectively are:
mpirun -np 8 ./laghos -p 1 -dim 2 -rs 3 -tf 0.8 -pa
mpirun -np 8 ./laghos -p 1 -dim 3 -rs 2 -tf 0.6 -pa -vis
The latter produces the following density plot (notice the -vis
option)
Laghos includes also smooth test problems that expose all the principal
computational kernels of the problem except for the artificial viscosity
evaluation. (Viscosity can still be activated for these problems with the
--impose-viscosity
option.)
Some sample runs in 2D and 3D respectively are:
mpirun -np 8 ./laghos -p 0 -dim 2 -rs 3 -tf 0.5 -pa
mpirun -np 8 ./laghos -p 0 -dim 3 -rs 1 -tf 0.25 -pa
mpirun -np 8 ./laghos -p 4 -m data/square_gresho.mesh -rs 3 -ok 3 -ot 2 -tf 0.62 -s 7 -vis -pa
The latter produce the following velocity magnitude plots (notice the -vis
option)
This is a well known three-material problem that combines shock waves and vorticity, thus examining the complex computational abilities of Laghos.
Some sample runs in 2D and 3D respectively are:
mpirun -np 8 ./laghos -p 3 -m data/rectangle01_quad.mesh -rs 2 -tf 5.0 -pa
mpirun -np 8 ./laghos -p 3 -m data/box01_hex.mesh -rs 2 -tf 5.0 -vis -pa
The latter produces the following specific internal energy plot (notice the -vis
option)
To make sure the results are correct, we tabulate reference final iterations
(step
), time steps (dt
) and energies (|e|
) for the runs listed below:
mpirun -np 8 ./laghos -p 0 -dim 2 -rs 3 -tf 0.75 -pa
mpirun -np 8 ./laghos -p 0 -dim 3 -rs 1 -tf 0.75 -pa
mpirun -np 8 ./laghos -p 1 -dim 2 -rs 3 -tf 0.8 -pa
mpirun -np 8 ./laghos -p 1 -dim 3 -rs 2 -tf 0.6 -pa
mpirun -np 8 ./laghos -p 2 -dim 1 -rs 5 -tf 0.2 -fa
mpirun -np 8 ./laghos -p 3 -m data/rectangle01_quad.mesh -rs 2 -tf 3.0 -pa
mpirun -np 8 ./laghos -p 3 -m data/box01_hex.mesh -rs 1 -tf 5.0 -pa
mpirun -np 8 ./laghos -p 4 -m data/square_gresho.mesh -rs 3 -ok 3 -ot 2 -tf 0.62831853 -s 7 -pa
mpirun -np 8 ./laghos -p 7 -m data/rt2D.mesh -tf 4 -rs 1 -ok 4 -ot 3 -pa
run |
step |
dt |
e |
---|---|---|---|
1. | 339 | 0.000702 | 4.9695537349e+01 |
2. | 1041 | 0.000121 | 3.3909635545e+03 |
3. | 1154 | 0.001655 | 4.6303396053e+01 |
4. | 560 | 0.002449 | 1.3408616722e+02 |
5. | 413 | 0.000470 | 3.2012077410e+01 |
6. | 2872 | 0.000064 | 5.6547039096e+01 |
7. | 858 | 0.000474 | 5.6691500623e+01 |
8. | 776 | 0.000045 | 4.0982431726e+02 |
9. | 2462 | 0.000050 | 1.1792848680e+02 |
Similar GPU runs using the MFEM CUDA device can be run as follows:
./laghos -p 0 -dim 2 -rs 3 -tf 0.75 -pa -d cuda
./laghos -p 0 -dim 3 -rs 1 -tf 0.75 -pa -d cuda
./laghos -p 1 -dim 2 -rs 3 -tf 0.80 -pa -d cuda
./laghos -p 1 -dim 3 -rs 2 -tf 0.60 -pa -d cuda
./laghos -p 3 -m data/rectangle01_quad.mesh -rs 2 -tf 3.0 -pa -d cuda
./laghos -p 3 -m data/box01_hex.mesh -rs 1 -tf 5.0 -pa -cgt 1e-12 -d cuda
./laghos -p 4 -m data/square_gresho.mesh -rs 3 -ok 3 -ot 2 -tf 0.62831853 -s 7 -pa -d cuda
./laghos -p 7 -m data/rt2D.mesh -tf 4 -rs 1 -ok 4 -ot 3 -pa -d cuda
An implementation is considered valid if the final energy values are all within round-off distance from the above reference values.
Each time step in Laghos contains 3 major distinct computations:
By default Laghos is instrumented to report the total execution times and rates, in terms of millions of degrees of freedom per second (megadofs), for each of these computational phases. (The time for inversion of the local thermodynamic mass matrices (CG L2) is also reported, but that takes a small part of the overall computation.)
Laghos also reports the total rate for these major kernels, which is a proposed Figure of Merit (FOM) for benchmarking purposes. Given a computational allocation, the FOM should be reported for different problem sizes and finite element orders.
A sample run on the Vulcan BG/Q machine at LLNL is:
srun -n 294912 laghos -pa -p 1 -tf 0.6 -pt 911 -m data/cube_922_hex.mesh \
--ode-solver 7 --max-steps 4
--cg-tol 0 --cg-max-iter 50 -ok 3 -ot 2 -rs 5 -rp 2
This is Q3-Q2 3D computation on 294,912 MPI ranks (18,432 nodes) that produces rates of approximately 125419, 55588, and 12674 megadofs, and a total FOM of about 2064 megadofs.
To make the above run 8 times bigger, one can either weak scale by using 8 times
as many MPI tasks and increasing the number of serial refinements: srun -n 2359296 ... -rs 6 -rp 2
, or use the same number of MPI tasks but increase the
local problem on each of them by doing more parallel refinements: srun -n 294912 ... -rs 5 -rp 3
.
In addition to the main MPI-based CPU implementation in https://github.com/CEED/Laghos, the following versions of Laghos have been developed
You can reach the Laghos team by emailing laghos@llnl.gov or by leaving a comment in the issue tracker.
The following copyright applies to each file in the CEED software suite, unless otherwise stated in the file:
Copyright (c) 2017, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory. LLNL-CODE-734707. All Rights reserved.
See files LICENSE and NOTICE for details.