CEED / Laghos

High-order Lagrangian Hydrodynamics Miniapp
http://ceed.exascaleproject.org/miniapps
BSD 2-Clause "Simplified" License
189 stars 61 forks source link

GPU run of verification result #5 does not match values in table #175

Closed cdm-work closed 1 year ago

cdm-work commented 1 year ago

In the README.md, there is a table of expected values for different MPI runs. After the table, it also gives commands that are expected to work with GPU support and return the same expected values. With my own testing against Nvidia and AMD GPUs, the verification results match except for run 5.

Here is my results for the CPU run 5:

vers@ss10grizzlypeak001:~> srun -n8 -p workqss10 ${LAGHOS_ROOT}/bin/laghos -p 2 -dim 1 -rs 5 -tf 0.2 -fa
<snip>
step   410, t = 0.1986, dt = 0.000470,  |e| = 3.1987264362e+01
step   413, t = 0.2000, dt = 0.000470,  |e| = 3.2012077410e+01

CG (H1) total time: 0.1181249440
CG (H1) rate (megadofs x cg_iterations / second): 18.3783452206

CG (L2) total time: 0.0031617820
CG (L2) rate (megadofs x cg_iterations / second): 34.0061395757

Forces total time: 0.0076223940
Forces rate (megadofs x timesteps / second): 56.6436214134

UpdateQuadData total time: 0.0268307880
UpdateQuadData rate (megaquads x timesteps / second): 16.1056768068

Major kernels total time (seconds): 0.1513464110
Major kernels total rate (megadofs x time steps / second): 20.0522032861

Energy  diff: 2.78e-06

This is what I see when I run on a node with an Nvidia A100:

vers@ss10grizzlypeak001:~> ${LAGHOS_GPU_ROOT}/bin/laghos -p 2 -dim 1 -rs 5 -tf 0.20 -fa
<snip>
step   135, t = 0.1981, dt = 0.001467,  |e| = 2.8284271247e+01
step   137, t = 0.2000, dt = 0.000459,  |e| = 2.8284271247e+01

CG (H1) total time: 0.3403743400
CG (H1) rate (megadofs x cg_iterations / second): 2.2845787964

CG (L2) total time: 0.0077113670
CG (L2) rate (megadofs x cg_iterations / second): 4.5480911491

Forces total time: 0.0142958950
Forces rate (megadofs x timesteps / second): 9.8514993290

UpdateQuadData total time: 0.0554843500
UpdateQuadData rate (megaquads x timesteps / second): 2.5330385956

Major kernels total time (seconds): 0.4101545850
Major kernels total rate (megadofs x time steps / second): 2.5819338336

Energy  diff: 0.00e+00

If I run with my binary built against ROCm 5.4.1 for an AMD Mi250, I get the same results as I did for the Nvidia GPU. And for both GPUs, all of the other GPU verification results do match the values in the table.

To me, it seems like the wrong command has been shared for the GPU verification result 5, since I find it hard to believe that the GPU results would match for the other 7 listed results.

I built Laghos, matching tag 'v3.1', MFEM matching 'v4.5.2', metis 5.1.0, and hypre 'v2.28.0'.

vladotomov commented 1 year ago

Hi @cdm-work,

I get your output when I run laghos -p 2 -dim 1 -rs 5 -tf 0.20 -fa -d cuda But this is a 1D test, and we don't support -d cuda execution for it. Without that argument, the result is correct. I changed the README.md to note this.

cdm-work commented 1 year ago

OK, that makes sense. As you have changed the 'README.md', then I don't have to worry about the difference in that output.