Open ax3l opened 5 years ago
First test looks surprisingly stable after 60'000 steps (plane wave, a0=5):
Energy is not fully converged yet and one could do as follow-up tests:
From your data the speedup due to fast math seems rather minor. Comparing configurations 000 and 300, calculation time is 38:52 vs. 39:32 and full simulation time is also about 40 seconds apart, in relative terms only about 1.7% speedup.
Just for notes: I did a bit of I/O (text-based histograms every 100 steps) during the sim and one might want to compare pure simulation time without init. But I quite tend to agree that the "risk" of fast-math for the little currently seen speedup might not be worth it to make it "default on".
Nevertheless, the influence from the first view seems little and the next steps as outlined above need to be done to verify further.
Nevertheless, one has to verify that for several setups and without I/O. Further research welcome!
This issue documents a verification run for "fast math" (
-ffast-math
ongcc
or--use_fast_math
onnvcc
).I suspect that for long running simulations (e.g. 30k steps and more), it has a significant influence on final energy spectra.
Fast math is by default enabled in the PIConGPU GPU/
cuda
backend (viaALPAKA_CUDA_FAST_MATH
) and by default disabled on the CPU backends (such as OpenMP/omp2b
) where it needs to be passed byCXXFLAGS
to control, e.g. viaexport CXXFLAGS="-g0 -O3 -m64 -ffast-math"
.Method
Running the
FoilLCT
(a0=5, plane wave laser, 192 n_c, 1mu foil) example with8.cfg
repeatedly with varied:runs: (first number as in
cmakeFlags
)Diff to Default example
Commit
Branch:
topic-fastMathFoilTest
Output
On HZDR file dirs in
/bigdata/hplsim/development/huebl/foilLCT_fm/
.Version & Software
PIConGPU 0.4.2 on Hemera (HZDR) P100 with CUDA 9.2