GEOS-ESM / ESMA_cmake

Custom CMake macros for the GEOS Earth System Model
Apache License 2.0
4 stars 9 forks source link

Issues running GEOSgcm with GCC 12 #275

Closed mathomp4 closed 2 years ago

mathomp4 commented 2 years ago

Testing on discover as well as Intel MacBooks has shown that GCC 12 + GEOSgcm has an...issue. Namely, if you run at C12 or C24, with GCC 12 the model will occasionally die in FV3 during mapz.

For example, at C12, when you do many consecutive runs with GCC 11.3 and GCC 12.1, the C12 case will 100% succeed with GCC 11, but only succeed about 20-30% with GCC 12.

However, preliminary testing has shown that if we move the target arch for GNU Intel chips to haswell from westmere all is happy.

I will continue to test this to make sure it works on my systems. But, I invoke the name of @climbfuji to ask if he would be okay with this change for Rosetta2 M1 Macs. He provided the code for M1 Rosetta in https://github.com/GEOS-ESM/ESMA_cmake/pull/274 (which is soon to be brought in).

I think it should be fine (not many Westmere era systems exist anymore), but for now I'm only testing the GNU Intel Linux side and I'm not setup to test Rosetta2.

climbfuji commented 2 years ago

Thanks for asking! I don't see any problem with updating to "haswell" instead of "westmere".

mathomp4 commented 2 years ago

@climbfuji Thanks. But I might not change it for Rosetta2. If I look here:

https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment

I see:

Rosetta translates all x86_64 instructions, but it doesn’t support the execution of some newer instruction sets and processor features, such as AVX, AVX2, and AVX512 vector instructions. If you include these newer instructions in your code, execute them only after verifying that they are available. For example, to determine if AVX512 vector instructions are available, use the sysctlbyname function to check the hw.optional.avx512f attribute.

So, unless sysctl -a | grep hw.optional.avx turns up a 1 for avx and avx2, we might want to keep using Westmere for Rosetta2. My reading of the GCC options seems to say Westmere is basic enough.

Then again, maybe GCC is smart enough to handle that? Eh. We know Westmere works for you, we keep it working! 😄

ETA: Note: GEOS isn't fancy enough to actually call AVX instructions. We let the compiler do it for us if it thinks it can.

climbfuji commented 2 years ago

You are correct:

JCSDA-L-18146:spack-stack-package-cleanup-20220602 heinzell$ sysctl -a | grep hw.optional.avx
hw.optional.avx1_0: 0
hw.optional.avx2_0: 0
hw.optional.avx512bw: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512f: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0
hw.optional.avx512vl: 0

:-( Need to keep westmere for Rosetta 2 then. Thanks for checking!