Closed mathomp4 closed 2 years ago
Here are some (on-going) results.
These are 1-day runs of GEOSgcm on the Cascade Lakes at NCCS with no history and no checkpointing. I built each as both Release and Aggressive and these are Model Throughput in days/day.
Resolution | Release xCore | Release march |
---|---|---|
C360 L072 | 135.384 | 139.923 |
C360 L181 | 53.638 | 54.865 |
C720 L072 | 57.344 | 58.237 |
C720 L181 | 22.058 | 22.354 |
Resolution | Agg xCore | Agg march |
---|---|---|
C360 L072 | 154.028 | 159.308 |
C360 L181 | 60.027 | 62.031 |
C720 L072 | 65.070 | 66.079 |
C720 L181 | 24.852 | 25.392 |
Pending test by @aoloso and myself, I think we might recommend to @wmputman and @sdrabenh to update the arch flag for Intel Fortran. Everything seems pretty good performance wise.
Tests at NAS have shown that if we use -march=core-avx2
then we gain quite a bit of "ease" with GEOS.
I built GEOSgcm using -march=core-avx2
once on pfe (Intel chip) and once on a Rome node (AMD chip). I then made four experiments:
When all was done, 1 == 2 and 3 == 4. That is, no matter where you build, you can get the same answers on the same architecture.
Of course, a run on AMD will never be zero-diff to a run on Intel, but at least we have "weak" form of equivalence. (Or "strong"? Maybe @tclune and I need to come up with the strong/weak version of "running on different architectures" 😄 )
With the proliferation of a lot of AMD EPYC (Rome) nodes at NAS, we might want to change the architecture flags for Intel Fortran in GEOS. Currently, we use
-xCORE-AVX2
on Intel processors, but this has the problem that it uses instructions that don't exist on the AMD chips.On AMD we can use
-march=core-avx2
and this will work on both Intel (Haswell+) and AMD Rome with no changes needed. (I'm not sure if they'd be non-zero-diff between Intel and AMD, but they should run. This needs to be tested at NAS.)But, it is non-zero-diff and possibly slower if we are somehow crucially using one of the AVX2 instructions only in
-xCORE-AVX2
. I'm doing some runs now to see if I see a performance hit.