NOAA-EMC / WW3

WAVEWATCH III
Other
262 stars 530 forks source link

STAB3 issue on cold start #868

Closed SanderHulst closed 1 year ago

SanderHulst commented 1 year ago

Describe the bug We are investigating whether a term for atmospheric stability improves our day-to-day forecast. We therefor compiled our WAVEWATCH-6.07 with the Intel OneAPI 2022.1.0 and Intel MPI 2021.6.0 in hybrid mode with the STAB3 switch. The model runs fine as long as ww3_prnc writes the wind field without air-sea temperature difference (WND), even on a cold start. However, as soon as ww3_prnc writes the wind field and the air-sea temperature difference (WNS), then ww3_multi aborts.

When I compile with debugging enabled I get (mind you, 6.07)

ww3_multi          0000000000DE78E5  w3src4md_mp_w3sin         647  w3src4md.F90
ww3_multi          0000000000DCE875  w3srcemd_mp_w3src         804  w3srcemd.F90
ww3_multi          0000000000BCB72A  w3wavemd_mp_w3wav        1205  w3wavemd.F90
libiomp5.so        000014975E400893  __kmp_invoke_micr     Unknown  Unknown
libiomp5.so        000014975E374429  __kmp_fork_call       Unknown  Unknown
libiomp5.so        000014975E330425  __kmpc_fork_call      Unknown  Unknown
ww3_multi          0000000000BA6153  w3wavemd_mp_w3wav        1186  w3wavemd.F90
ww3_multi          00000000007F30E6  wmwavemd_mp_wmwav         654  wmwavemd.F90
ww3_multi          000000000040544B  MAIN__                    149  ww3_multi.F90
ww3_multi          0000000000404942  Unknown               Unknown  Unknown
libc-2.26.so       000014975DF1F13A  __libc_start_main     Unknown  Unknown
ww3_multi          000000000040486A  Unknown               Unknown  Unknown
forrtl: severe (408): fort: (3): Subscript #3 of the array TAUHFT2 has value -2147483648 which is less than the lower bound of 0

In the computation of subscript 3, the action density is used. Some print statements showed that it was NaN and that propagated into the index.

To Reproduce I have a testcase available that hold the wind field with and without air-sea temperature difference. I can share it.

Expected behavior WAVEWATCH should not crash on a cold start + active STAB3.

Screenshots image

Additional context I prepared a fix that I'm testing in my fork of v6.07 (release v6.07.5). I've also merged it to the development branch of my fork.

MatthewMasarik-NOAA commented 1 year ago

Hi @SanderHulst, could you please share your test case?

aronroland commented 1 year ago

Hi Sander, I just saw your bug report. In this case (see index of -2147483648 TAUHFT) is in 99% some NaN or INF in wave action. Basically, I added some abort there but i need to have this into the develop. Basically, your run has either some non-defined errors due to a bug in the code or it has blown up. Cheers