Closed TingLei-daprediction closed 2 years ago
An update: this run-time error occurred because both sumcoslattot (2) and sumfttot(2) =0 , because they correspond to values for south hemisphere. It is correct because in those regional fv3 run, the domain doesn't extend to the south hemisphere. The "old " version of the enkf (the git log infor as below) also give 0 values for those two variable for SH, but won't abort. ^
git log commit 548af240031e35a44d9185d82b5c28c26b5c010e (HEAD -> master, origin/master, origin/HEAD) Merge: 8d81a3f2e 8a4006196 Author: Ting.Lei-NOAA Ting.Lei@noaa.gov Date: Mon Mar 29 19:11:45 2021 +0000
merging with curent GSI EMC master
commit 8a4006196bb4fd68c42a8f87fb9d31f0a4b2edb5 (upstream/master)
V
Hence, it should be the recent changes in the GSI compiling setup that revealed this unsafe operation ( divided by 0).
These local variables are used for displaying in the standard output . In the following lines, those zero values are identified to avoid to print them for corresponding SH , not to say to affect the analysis results.
And, when the GSI is built with optimization, this run-time error would disappear.
Hence, the fix is not urgently needed, though I think an explicit treatment for this point is needed and can be added in one of existing PR in waiting.
@MichaelLueken-NOAA , please let me know how what is your plan on this.
I will let this issue open for being now.
@TingLei-daprediction I only have regression tests for global EnKF runs, so I have no way of testing regional FV3 EnKF. The src/enkf/inflation.f90 file hasn't been updated since May 11, 2020.
If the values are zero, then clearly the code should fail with a divide-by-zero failure. As best as I can tell, the primary developer that has been working on the EnKF (especially for regional FV3) recently has been you. If these updates have been tested in debug mode, then I can only think of is that the compiler flags were updated with Rahul's CMake refactor. The updated debug compiler options might be catching what the previous build missed.
@MichaelLueken-NOAA Yes, I think this run-time error with the enkf of debug mode is reported by the program because of the change in the compiling side. This was confirmed by my test using "older" version as mentioned in my previous "update" part. So, in summary, we have two options here: the first, sticking to the "older" practice, using "older" compiling options, even in debug mode, the program would "avoid" the run-time error from that "divided by zero" line, but , the program would evaluate them to make the abnormal values not be shown/used , as in the following line , ^
V To stick to the "older" practice, we only need to change the compiling options back to the "older" one. The advantage of this option is the code would be more neat.
The second option is that, we think it is preferable for the "divided by zero" to be explicitly taken care of where it occur ("the code should fail with a divide-by-zero failure"). Then we change change that array division to a loop of divisions over array elements and the "tiny" evaluation as in the above screenshot would be done for each array element to divide before the division using that element would be done. That means a few extra lines of codes.
So, if you agree and There would be no other GSI developers 's convincing arguments for different options, I will take the second option and add them in my PR https://github.com/NOAA-EMC/GSI/pull/362. Further comments/suggestions are welcome. PS: and yes, the regression tests for FV3REG enkf will be very helpful . Right now, we are doing our own verification tests for the changes of codes. Regression tests with the GSI package can help more. And great other GSI developers are working on that.
@TingLei-daprediction The code should never be able to divide by zero (especially in debug mode), so it would be best to correct this so that the code isn't dividing by zero. Removing debug options is not the best course forward.
@MichaelLueken-NOAA . Sure. I will make the change as described above in the option 2 and put them in the PR 362. Thanks.
On hera, using a clean check-out GSI from the EMC repository to repeat a previous case, built with debug mode , it was found the "floating invalid " run-time error ^^ 0: forrtl: error (65): floating invalid 0: Image PC Routine Line Source 0: enkf.x 00000000016AE0DE Unknown Unknown Unknown 0: libpthread-2.17.s 00002AFD9C7BC630 Unknown Unknown Unknown 0: enkf.x 00000000004D14A3 inflation_mp_infl 318 inflation.f90 0: enkf.x 000000000040A1D7 MAIN__ 231 enkf_main.f90 0: enkf.x 000000000040855E Unknown Unknown Unknown 0: libc-2.17.so 00002AFD9D687555 __libc_start_main Unknown Unknown 0: enkf.x 0000000000408469 Unknown Unknown Unknown V The the latest commit-in in the current version is shown as below: ^ [Ting.Lei@hfe07 enkf]$ git log commit dd1bac516fa595bc5f4274885aa3cdc4ed465826 (HEAD -> develop, origin/develop, origin/HEAD) Merge: 659bdcee 67b61469 Author: MichaelLueken-NOAA 63728921+MichaelLueken-NOAA@users.noreply.github.com Date: Tue Jun 21 13:52:17 2022 +0000
............................ V The line 318 in inflation.f90 is ^ sumftot = sqrt(sumftot/sumcoslattot)
v
It needs to be found what caused this run-time error, for example , different namelist parameters are needed ? The log file is debug-enkf_eupd_tm00_00.log in /scratch2/NCEPDEV/stmp3/Ting.Lei/logs/EXP3Netcdf_ColdStart_13kmESG__stpert0/dr2021090400.log , in case other GSI developers are interested to have a look into it.