MESH-Model / MESH-Dev

This repository contains the official MESH development code, which is the basis for the 'tags' listed under the MESH-Releases repository. The same tags are listed under this repository. Legacy branches and utilities have also been ported from the former SVN (Subversion) repository. Future developments must create 'forks' from this repository.
Other
2 stars 3 forks source link

ns flag of WB output causing crashes with vector-based setups #22

Open mee067 opened 7 months ago

mee067 commented 7 months ago

This seems to be a weird error as it does not occur in all cases. For example, I ran the same setup successfully using 25 soil layers. When I switched to 4 layers, I got the error. The crash is thrown by this line 1376 (in save_basin_output.f90)

ina = min(ina, shd%NAA)

because ina got assigned some strangely large number for the 3rd sub-basin and instead of being assigned 3. This routine is called from line 766 (writing the WB output):

       !> Daily: IKEY_DLY
        if (ic%now%day /= ic%next%day) then
            if (btest(bnoflg%wb%t, 0)) then
!-                call save_water_balance(shd, 86400, IKEY_DLY)
                call write_water_balance(fls, shd, fls%fl(mfk%f900)%iun, 86400, shd%NAA, out%d)
                if (allocated(bnoflg%wb%ns)) then
                    do n = 1, size(bnoflg%wb%ns)
                        if (bnoflg%wb%ns(n) > 0) then
                            call write_water_balance(fls, shd, (fls%fl(mfk%f900)%iun*1000 + n), 86400, &
                                                     fms%stmg%meta%rnk(bnoflg%wb%ns(n)), out%d)
                        end if
                    end do
                end if
!-                call reset_water_balance(IKEY_DLY)
            end if

`

I printed bnoflg%wb%ns(n) upon assignment and it is always identical to "n", the loop counter. I printed it before calling the "write_water_balance" routine to and it came out as: 1 2 1112486707

before it crashed at the added print line, simply because there is no rank for such subbasin - it should be 3.

As I mentioned, I had no issue with this before the recent changes and I have no issues for grid-based setup (as far as I see). The same setup ran with 25 soil layers without crashes.

if bnoflg%wb%ns(n) = n; why do we need to save it?

I also noticed options nr and n for WB and these are not documented - we talked about "nr" before but I did not know it got implemented. I could not figure out the "n" option.

dprincz commented 7 months ago

It's a generic routine to write the output provided any n. The check is meant to ensure 'n' is within the range from 1:NAA. Maybe before that line of ina = min(ina, NAA), you could print the two values to see which or if both are causing the issue in that instance. Functionally, there --should-- be no difference between the indices of vector v. grid-based setups...

The option of 'n' is to print the specified grid cell(s). 'nr' is a shortcut to automatically print those associated with reservoirs. Similar 'ns' is a shortcut to automatically print those associated with outlet locations. While they might exist in part, I don't know if the other options have been completely implemented.

mee067 commented 7 months ago

I understand the routine is generic for both grid-based and vector-based setups. As I said, it ran fine for the same vector-based setup when I used 25 soil layers (with a dp-compiled version). It looks like a weird error - at one point it picks some wrong number from the memory causing a crash.

I printed NAA and ina upon the routine call, and it gave an error for inabecause the index went wrong (from the routine call). I stepped the printing up to the calling routine and got the numbers given above. For n = 3, something went wrong, and bnoflg%wb%ns(3) = 1112486707. I printed the values upon assignment in the initialization routine, and bnoflg%wb%ns(3)=3.

obviously, 1112486707 is out of range for the array and thus it crashes.

Is "nr" an option for the WB flag? or is it activated via "REACHOUTFLAG"?

How to use the "n" option? do we specify a list of ranks after it?

mee067 commented 7 months ago

seems it has something to do with the recent fixes - older compiled code works. That's weird because I did not touch this module in the recent fixes and bnoflg%wb%ns is kind of local to the module, although it is saved between the function calls as per line 133 before any subroutine definition:

type(BasinOutputConfig), save :: bnoflg

should we try to make it a global variable?

mee067 commented 7 months ago

I cannot understand it but it does have something to do with the LongSimFix. I reversed the fix and the code works in such a case. There is a hidden link between the rte variables (changing the 999999 to 1) and bnoflg%wb%ns.

mee067 commented 7 months ago

I have 38 gauges in the streamflow file. Printing after the parsing call, I get:

            call parse_basin_output_flag(shd, BASINAVGWBFILEFLAG, bnoflg%wb)
            do n = 1, size(bnoflg%wb%ns)
                write(*, *) n, bnoflg%wb%ns(n)              ! ME 2023/11/27
            end do
       1           1
       2           2
       3           3
       4           4
       5           5
       6           6
       7           7
       8           8
       9           9
      10          10
      11          11
      12          12
      13          13
      14          14
      15          15
      16          16
      17          17
      18          18
      19          19
      20          20
      21          21
      22          22
      23          23
      24          24
      25          25
      26          26
      27          27
      28          28
      29          29
      30          30
      31          31
      32          32
      33          33
      34          34
      35          35
      36          36
      37          37
      38          38

printing before the "write_water_balance" call

        !> Daily: IKEY_DLY
        if (ic%now%day /= ic%next%day) then
            if (btest(bnoflg%wb%t, 0)) then
!-                call save_water_balance(shd, 86400, IKEY_DLY)
                call write_water_balance(fls, shd, fls%fl(mfk%f900)%iun, 86400, shd%NAA, out%d)
                if (allocated(bnoflg%wb%ns)) then
                    do n = 1, size(bnoflg%wb%ns)
                        write(*, *) n, bnoflg%wb%ns(n), size(bnoflg%wb%ns)              ! ME 2023/11/27
                        if (bnoflg%wb%ns(n) > 0) then                           
                            call write_water_balance(fls, shd, (fls%fl(mfk%f900)%iun*1000 + n), 86400, &
                                                     fms%stmg%meta%rnk(n), out%d)
                        end if
                    end do
                end if
!-                call reset_water_balance(IKEY_DLY)
            end if

produces: 1 1 38 2 2 38 3 1112486707 38 4 -1082130432 38 5 -1082130432 38 6 1116536832 38 7 -1082130432 38 8 1095237632 38 9 -1082130432 38 10 -1082130432 38 11 1116458189 38 12 -1082130432 38 13 -1082130432 38 14 -1082130432 38 15 -1082130432 38 16 -1082130432 38 17 -1082130432 38 18 -1082130432 38 19 -1082130432 38 20 -1082130432 38 21 -1082130432 38 22 -1082130432 38 23 -1082130432 38 24 -1082130432 38 25 -1082130432 38 26 -1082130432 38 27 -1082130432 38 28 -1082130432 38 29 -1082130432 38 30 -1082130432 38 31 -1082130432 38 32 -1082130432 38 33 -1082130432 38 34 1132494848 38 35 -1082130432 38 36 -1082130432 38 37 -1082130432 38 38 -1082130432 38

the condition ">0" makes replacing bnoflg%wb%ns(n) with n fails for most gauges. What made the variable bnoflg%wb%ns change values?

mee067 commented 7 months ago

the only place where this variable is assigned is in the parsing step during initialization - I am puzzled and it is indeed looking like a weird memory thing.

dprincz commented 7 months ago

Did you rebuild this code in its entirety since you made the 'LongSimFix' changes?

mee067 commented 7 months ago

yes using intel 2018 and intel 2021 - I got the same problem so it is not a compiler issue.

mee067 commented 7 months ago

we need to make the bnoflgvariable resilient

mee067 commented 7 months ago

@kasra-keshavarz has just reported the same issue for the Fraser setup using a serially-complied version with intel (2020) on Graham

mee067 commented 6 months ago

Zelalem reported it too for his setup. He did not get a crash but got output for some sub-basins and no output for others.

mee067 commented 5 months ago

This seems to have been forgotten. I have a fix but I did not upload it. The fix eliminates the use of "bnoflg%wb%ns(n)" and uses n directly - in all the cases I have, n = bnoflg%wb%ns(n) at the assignment time. As long as MESH prints WB for all sub-basins, the flag will be 1 for all. There is currently no option to select but maybe there was an idea to allow selecting.

sujata91 commented 3 months ago

Just wondering if there is any update on this. As I am unable to produce the water balance for the subbasin using control flag BASINAVGWBFILEFLAG daily ns from the model (both 1860_ME and 1860_ME_ZT version) @mee067 @dprincz