Open ekluzek opened 3 years ago
Running in DEBUG mode I find a problem with the following division...
if ( snow(g) < 0.0_r8 ) then
temp(g) = 0.0_r8
write(iulog,*)'warning: snow<0, setting snowmasking factor to zero. (snow(g) = ',snow(g),', overwriting so snow(g)=0.0)'
snow(g) = 0.0_r8
else
temp(g) = snow(g) / ( snow(g) + snowmask(g) )
end if
It checks for snow(g) < zero, but not if (snow(g) +_ snowmask(g)) == zero. So it probably needs another check for that in the code.
@marysa does the above sound right to you? What do you think the best way to solve this divide by zero issue might be?
Here's the traceback from the cesm.log file:
1046:MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-49.16.x86_64
1046:MPT: (gdb) #0 0x00002b39ca0286da in waitpid ()
1046:MPT: from /glade/u/apps/ch/os/lib64/libpthread.so.0
1046:MPT: #1 0x00002b39ca96fdb6 in mpi_sgi_system (
1046:MPT: #2 MPI_SGI_stacktraceback (
1046:MPT: header=header@entry=0x7ffeade860c0 "MPT ERROR: Rank 1046(g:1046) received signal SIGFPE(8).\n\tProcess ID: 3297, Host: r4i4n19, Program: /glade/scratch/erik/SMS_D.f19_g17.I2000SlimRsGs.cheyenne_intel.clm-global_uniform.GC.slim-n1_cesm21ch"...) at sig.c:340
1046:MPT: #3 0x00002b39ca96ffb2 in first_arriver_handler (signo=signo@entry=8,
1046:MPT: stack_trace_sem=stack_trace_sem@entry=0x2b39d4fc0080) at sig.c:489
1046:MPT: #4 0x00002b39ca97034b in slave_sig_handler (signo=8, siginfo=<optimized out>,
1046:MPT: extra=<optimized out>) at sig.c:564
1046:MPT: #5 <signal handler called>
1046:MPT: #6 0x0000000000c1fccc in mml_mainmod::mml_main (bounds=..., atm2lnd_inst=...,
1046:MPT: lnd2atm_inst=...) at /glade/work/erik/slim_cesm21/src/main/mml_main.F90:636
It doesn't explicitly say a divide by zero, but does say it's a floating point exception, so I'm assuming it's a divide by zero.
Oh! Yeah! That absolutely could be a problem and we SHOULD check to make sure we're not about to divide by zero! If snow(g)==0
, temp(g)
should = 0.
Here, temp(g) is the factor that is used to modify the surface albedo when there is snow. If there isn't much snow, temp(g) is small and more weight goes to bare-ground albedo, while if there is a lot of snow, temp(g) is large and the albedo looks more like snow albedo than bare-ground albedo. I probably though dividing by zero could never happen because mentally I would expect snowmask(g)
(the number that controls how quickly snow makes the ground "look" like snow vs like bare ground) to never be zero, but there is nothing stopping anybody from setting snowmask(g)=0, and if that happened and there wasn't any snow on the ground, it would indeed be dividing by zero.
(also snowmask(g) should never be allowed to be negative, that could also result in weirdness)
I see this same problem with izumi_intel compiler as well. And izumi_nag test fails as well.
@marysa I checked in a fix for this in this commit...
6c8630b3c76e25e4737421e13978eea69734b930
Please look it over and make sure you approve.
The following test is crashing with an error about a NaN
SMS.f19_g17.I2000SlimRsGs.cheyenne_intel.clm-global_uniform
The output error is...
And the line is:
call endrun( sub//' ERROR: One or more of the output from CLM to the coupler are NaN ' )