MESH-Model / MESH-Dev

This repository contains the official MESH development code, which is the basis for the 'tags' listed under the MESH-Releases repository. The same tags are listed under this repository. Legacy branches and utilities have also been ported from the former SVN (Subversion) repository. Future developments must create 'forks' from this repository.
Other
2 stars 3 forks source link

Activating SUBBASINFLAG #10

Closed dprincz closed 1 year ago

dprincz commented 1 year ago

As requested by @mee067. Implemented using 'ACLASS' approach while assigning basin information from file.

mee067 commented 1 year ago

Can you clarify whether this only masks the sub-basin or also re-orders Rank & Next?

dprincz commented 1 year ago

It only masks the subbasin (deactivating all GRUs in grids not matching the mask). I'd be able to fit in the code to reorder Rank/Next in this placement as well, but I recalled the issue with the nested basins that I don't think was ever resolved. Similarly, built-in polishing could be accommodated in this placement as well, as another potential optimization option.

mee067 commented 1 year ago

why deactivate tiles one by one, why not deactivate the whole grid such that the mask is at the higher grid level? In such case, the check could be made using FRAC instead of ACLASS. Where does it check anyway?

I assume this code should work with the routing.

Not sure how polishing can be built in or the other optimization options you have in mind.

dprincz commented 1 year ago

Grids can't be deactivated without the re-ranking/re-ordering, hence this is the middle-ground 'ACLASS' solution. It should work with routing. In such a case, it just routes with zero runoff contributing to the cell. However, I've only tested it at smaller scales; let me know if it causes errors at the larger scale, of the MRB for example. Deactivating cells in routing would require re-ranking/re-ordering, so it requires some more work (e.g., fixing the nesting issue then porting over the code from the GRIP-E code).

Polishing can be built-in rather easily using a similar approach. Let me know if it's immediately required/make a new issue as a requested feature if that's the case.

mee067 commented 1 year ago

OK, will test it on the MRB and let you know.

To fix the ordering for nested basins, the code needs to know that a basin is nested within another and considers the bigger one only for generating the RANK/NEXT fields. I think that can be done if it follows the river, similar to what we did when masking to calculate metrics for incremental basins. That was in a branch of r1545_MAH.

mee067 commented 1 year ago

Where does it check for ACLASS=0 to skip calculating for a tile?

dprincz commented 1 year ago

ACLASS is reassigned by SUBBASINFLAG in basin_utilities.f90. The determination of NML occurs maybe in the next block of code after that, if an HLSS is active. The omitted tiles do not exist at all from that point onward. You should see the difference in the NML printed to screen during that phase of initialization. Seeing the more details subbasin summary might require 'DIAGNOSEMODE on'.

mee067 commented 1 year ago

First, it is called after MESH prints the tile mapping so I cannot see the new mapping. Then, from the time it has taken and the output it produced, it does not seem to have applied the mask. I printed gridded output for snow and still have values for the whole MRB? Additionally, it crashed when I tried to read the initial conditions from the MRB because the mask is created before the resume files are read. Thus, it would only initialize from CLASS.in. I think the order of things can be worked out to allow reading initial conditions from the larger basin and masking out what's outside the smaller subbasin.

dprincz commented 1 year ago

Thanks for the update. It should be called before MESH prints the tile mapping, and you should see the summary printed also:

 READING: MESH_drainage_database.r2c
   23 attributes found in the file.
   14 variables found in the file.
   37 valid fields found in the file.
   INFO: Found the variable 'FileType'.
...
 READING: MESH_input_streamflow.txt
   Skipping 152 records.
   Number of streamflow gauges: 1
             GAUGE              IY              JX            RANK        DA (km2)
          02BA003                4               2               6    419.2766
 SUBBASIN mask is ACTIVE.
   Masking domains for 1 subbasins.
          SUBBASIN           GRIDS
                 1               4
   Total number of grids: 14
   Total number of grids inside the basin: 13
   Side length of grid: 14930.0791 m
   Number of river classes: 1
   Number of GRUs: 1
   Number of land-based tiles: 4
   Number of land tiles (NML): 4
           Tile ID            Grid             GRU
                 1               3               1
                 2               4               1
                 3               5               1
                 4               6               1
   Number of water tiles (NMW): 0

Regarding initial conditions, I thought you bypassed different Tile IDs using the NetCDF format resume files?

mee067 commented 1 year ago

Sorry, seems I compiled the wrong files (moving between servers). However, this is what I got when I compiled with the updated files, while reading the ddb:

INFO: Assigned the variable 'Elev'. INFO: Assigned the variable 'ChnlLength'. INFO: Assigned the variable 'IntSlope'. INFO: Assigned the variable 'Chnl'. INFO: Assigned the variable 'Reach'. INFO: Assigned the variable 'GridArea'. [gra825:7402 :0:7402] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x15d8) ==== backtrace ==== 0 0x0000000000010e90 __funlockfile() ???:0 1 0x00000000005cebfa basin_utilities_mp_basin_info_from_fieldlist() ???:0 2 0x000000000075ccc1 read_initialinputs() ???:0 3 0x0000000000845f7f MAIN() ???:0 4 0x000000000040c00e main() ???:0 5 0x00000000000202e0 libc_start_main() ???:0 6 0x000000000040bf2a _start() /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120

srun: error: gra825: task 0: Segmentation fault (core dumped) srun: launch/slurm: _step_signal: Terminating StepId=7485656.2

mee067 commented 1 year ago

funlockfile indicates it is trying to open a file that it is already open - what could that be?

dprincz commented 1 year ago

You can compile it with "symbols" included in the list of targets (if you still want to optimize the code; i.e., not use 'debug') to get the line numbers. It'll help to pinpoint where this occurs.

dprincz commented 1 year ago

With the modification, that's where it would read 'MESH_input_streamflow.txt' to get the gauge locations.

mee067 commented 1 year ago

When compiled with symbols, I found that the error iscaused by this statement:

        !> Adjust 'tile' areas by their active GRU-fraction.
        if (ro%RUNLSS) vs%tile%surface_area = vs%tile%surface_area*vs%tile%area_weight

This is your commenting style. This statement was in 1813 but was removed in 1860. When transferring my changes from 1813_ME to 1860_ME, I copied it along. I think it might be related to PBSM changes. I resolved the error resulting from that statement (which falsely pointed to a file) by moving it after the reassignment of the tile mapping.

Now, I am getting another weird error:

   REMARK: 'GRU 1' has zero coverage in the basin and is not active.
   REMARK: 'GRU 2' has zero coverage in the basin and is not active.
   REMARK: 'GRU 3' has zero coverage in the basin and is not active.
   REMARK: 'GRU 4' has zero coverage in the basin and is not active.
   REMARK: 'GRU 9' has zero coverage in the basin and is not active.
   REMARK: 'GRU 11' has zero coverage in the basin and is not active.
   REMARK: 'GRU 12' has zero coverage in the basin and is not active.
   REMARK: 'GRU 15' has zero coverage in the basin and is not active.
   Number of GRUs: 15
   Number of land-based tiles: 417
 READING: MESH_input_soil_levels.txt
   Number of soil layers: 25
             Level   Thickness (m)      Bottom (m)
                 1   0.1000000       0.1000000
                 2   0.1000000       0.2000000
                 3   0.1100000       0.3100000
                 4   0.1300000       0.4400000
                 5   0.1600000       0.6000000
                 6   0.2100000       0.8100000
                 7   0.2800000        1.090000
                 8   0.3700000        1.460000
                 9   0.4800000        1.940000
                10   0.6300000        2.570000
                11   0.8000000        3.370000
                12   0.9900000        4.360000
                13    1.220000        5.580000
                14    1.480000        7.060000
                15    1.780000        8.840000
                16    2.110000        10.95000
                17    2.480000        13.43000
                18    2.880000        16.31000
                19    3.330000        19.64000
                20    3.810000        23.45000
                21    4.340000        27.79000
                22    4.900000        32.69000
                23    5.510000        38.20000
                24    6.170000        44.37000
                25    6.870000        51.24000
 READING: MESH_parameters_CLASS.ini
   ERROR: The number of GRUs does not match the drainage database.
   Drainage database: 14
   MESH_parameters_CLASS.ini: 15
 READING: MESH_parameters_hydrology.ini

 ERROR: Errors occurred during initialization.
 Abnormal exit.

It has already read that there were 15 GRUs (excluding impervious) from the ddb. This is the same number in CLASS.ini. I understand that the remapping made some of them redundant as indicated - no problem. But it is then saying the ddb is reporting 14 GRUs. Maybe there is a duplicate statement taking the previous one again?

mee067 commented 1 year ago

Regarding initialization, it could work using nc files as long as you preserve the grid number which I see done when I looked at the new tile mapping. nc files are read by grid and then by GRU to get to any tile, right? All GRUs that do not exist within a grid are discarded. Isn't that the case now?

I tested that before by removing some tiles from a ddb (setting GRU fractions as zeros) while they existed in initialization files and initial conditions got assigned correctly. Even when I added tiles to the ddb that did not exist in the resume file, it did not crash as it used the CLASS.ini to initialize them - these were not overwritten as they did not exist in the resume files.

dprincz commented 1 year ago

That line moved as one of the changes. It should no longer exist in the file like this:

        !> Set to unit weight 1.0 where 'area_weight' is zero.
!        where (vs%grid%area_weight == 0.0) vs%grid%area_weight = 1.0

        !> Adjust 'tile' areas by their active GRU-fraction.
        if (ro%RUNLSS) vs%tile%surface_area = vs%tile%surface_area*vs%tile%area_weight

Ensure to preserve the change to the allocate statement, where 'vs%tile%surface_area' was added to the list in the new file:

            !> Allocate and assign maps.
            allocate( &
                vs%tile%from_cell(vs%landtile_count), vs%tile%from_gru(vs%landtile_count), vs%tile%from_grid_x(vs%landtile_count), &
                vs%tile%from_grid_y(vs%landtile_count), vs%tile%from_riverclass(vs%landtile_count), &
                vs%tile%lat(vs%landtile_count), vs%tile%lon(vs%landtile_count), vs%tile%area_weight(vs%landtile_count), &
                vs%tile%surface_area(vs%landtile_count))

The equivalent line moved inside the loop where it was added under 'area_weight':

                        !> Transfer fields.
                        vs%tile%lon(k) = vs%grid%lon(n)
                        vs%tile%lat(k) = vs%grid%lat(n)
                        vs%tile%area_weight(k) = gru_nm(n, m)
                        vs%tile%surface_area(k) = vs%grid%surface_area(n)*gru_nm(n, m)
mee067 commented 1 year ago

it is this line (574) in read_initial_inputs.f90:

shd%lc%NTYPE = maxval(vs%tile%from_gru)

In my case, GRU 15 has no active tiles, so the max becomes 14. I commented that line and it seems it bypassed the error assuming shd%lc%NTYPE is already set. Otherwise, it should be:

shd%lc%NTYPE = vs%gru_count

dprincz commented 1 year ago

For the GRU-count issue, it's an artefact of 'shd' still being used in some places while I was in the middle of replacing it with the 'vs' members. I don't remember why, but it's being assigned using the maximum GRU value in the tile maps, maybe because 'gru_count' didn't exist at the time. Masking the domain to remove active tiles could remove a GRU outright from the maps if it no longer exists in the active areas in the basin.

Find and change this line marked within !>>temp/!<<temp near the bottom of basin_utilities.f90. From this:

!>>temp
            shd%lc%NTYPE = maxval(vs%tile%from_gru)

To this:

!>>temp
            shd%lc%NTYPE = vs%gru_count

Also, find and change this line in the same block. From this: allocate(shd%lc%ACLASS(vs%grid%dim_length, maxval(vs%tile%from_gru) + 1)) To this: allocate(shd%lc%ACLASS(vs%grid%dim_length, vs%gru_count + 1))

dprincz commented 1 year ago

Let me know if that resolves the issue and I'll commit the change.

dprincz commented 1 year ago

it is this line (574) in read_initial_inputs.f90:

shd%lc%NTYPE = maxval(vs%tile%from_gru)

In my case, GRU 15 has no active tiles, so the max becomes 14. I commented that line and it seems it bypassed the error assuming shd%lc%NTYPE is already set. Otherwise, it should be:

shd%lc%NTYPE = vs%gru_count

Yes, but change the other line I noted too.

mee067 commented 1 year ago

OK, regarding the tile area adjustment, I think the new implementation within the loop just made the old line redundant rather than reducing the are a again, right?

I just wanted to make sure as I have been running with that code for a while and I hope I did not mess up :(

mee067 commented 1 year ago

Well, fixing 574 in line (574) in read_initial_inputs.f90 alone resolved the issue. Maybe because maxval(vs%tile%from_gru) in my case is 14 so the allocation statement assigns 15 to shd%lc%ACLASS second dimension. Why are you adding +1 again in the ACLASS allocation?

dprincz commented 1 year ago

OK, regarding the tile area adjustment, I think the new implementation within the loop just made the old line redundant rather than reducing the are a again, right?

I just wanted to make sure as I have been running with that code for a while and I hope I did not mess up :(

It moved because of the order of the variables being allocated. It only moved this way with these changes for SUBBASINFLAG. It's coded the same between r1813 (Ln 611) and r1860 (Ln 638).

dprincz commented 1 year ago

Well, fixing 574 in line (574) in read_initial_inputs.f90 alone resolved the issue. Maybe because maxval(vs%tile%from_gru) in my case is 14 so the allocation statement assigns 15 to shd%lc%ACLASS second dimension. Why are you adding +1 again in the ACLASS allocation?

Do you mean in basin_utilities.f90? It could be the case that 'shd%lc%ACLASS' is no longer used anywhere else in the code so whether it has the correct dimensions has no effect.

mee067 commented 1 year ago

Maybe - I am not sure. Or maybe because it is dimensioned to 14+1 = 15 which is enough; which raised the question of why to add 1 again.

dprincz commented 1 year ago

Maybe - I am not sure. Or maybe because it is dimensioned to 14+1 = 15 which is enough; which raised the question of why to add 1 again.

ACLASS used to be allocated by the number of GRUs read from the db.r2c including the impervious cover (read_shed_ef.f). As that file was ported from WATFLOOD, it's after creating that variable and reading that file that the GRU count was modified in MESH to remove impervious cover (but ACLASS would remain sized to consider impervious cover). In the modern code, the GRU count is already updated to remove the last GRU. To be consistent with the size of ACLASS, as it would be allocated before, you then have to add 1. The variable will be removed at the same time the 'shd' variable is removed. It might still be passed to RTE (and why it was left to preserve the WATFLOOD-expected shape), but I don't think it's used anywhere anymore.

mee067 commented 1 year ago

OK, to be on the safe side now, I changed the allocation as you prescribed. This resolves the issue.

I have also removed the tile area adjustment statement as it is already done in the loop as you described. Prior to the SUBBASINFLAG implementation, it was redundant as far as I see:

vs%tile%surface_area(k) = vs%grid%surface_area(n)*gru_nm(n, m) occurred much later than vs%tile%surface_area = vs%tile%surface_area*vs%tile%area_weight and used the grid variable rather than the tile variable so it will override the first value.

You do not have the code which has both statements.

dprincz commented 1 year ago

Going back to resume files, the seq files will be inherently incompatible by their nature. Can you confirm there's no issue using NetCDF files (as you saw before), and then I can close this issue and commit it to the main development branch.

mee067 commented 1 year ago

I am testing that right now

mee067 commented 1 year ago

btw, a related issue is that to_map gave errors when SUBBASINFLAG is on. Should that be opened as a new issue?

   Number of land-based tiles: 417
[gra810:12127:0:12127] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x4)
==== backtrace ====
 0 0x0000000000010e90 __funlockfile()  ???:0
 1 0x00000000009bbbf3 read_initial_inputs_()  /home/melshamy/MESH/02_MESH_Code_EXE/r1860_ME.SUBBASINFLAG/./Driver/MESH_Driver/read_initial_inputs.f90:639
 2 0x0000000000b0ddcb MAIN__()  /home/melshamy/MESH/02_MESH_Code_EXE/r1860_ME.SUBBASINFLAG/./Driver/MESH_Driver/MESH_driver.f90:243
 3 0x000000000040c08e main()  ???:0
 4 0x00000000000202e0 __libc_start_main()  ???:0
 5 0x000000000040bfaa _start()  /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================

line 639 reads: if (shd%NEXT(shd%RNKGRD(y, x)) > 0) then so it seems the subscripts are out of bounds - sounds weird as you have not changed the dimensions of the ddb.

For the initial conditions, the model read them without errors; Remains to see that it read the correct ones to be sure.

dprincz commented 1 year ago

btw, a related issue is that to_map gave errors when SUBBASINFLAG is on. Should that be opened as a new issue?

   Number of land-based tiles: 417
[gra810:12127:0:12127] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x4)
==== backtrace ====
 0 0x0000000000010e90 __funlockfile()  ???:0
 1 0x00000000009bbbf3 read_initial_inputs_()  /home/melshamy/MESH/02_MESH_Code_EXE/r1860_ME.SUBBASINFLAG/./Driver/MESH_Driver/read_initial_inputs.f90:639
 2 0x0000000000b0ddcb MAIN__()  /home/melshamy/MESH/02_MESH_Code_EXE/r1860_ME.SUBBASINFLAG/./Driver/MESH_Driver/MESH_driver.f90:243
 3 0x000000000040c08e main()  ???:0
 4 0x00000000000202e0 __libc_start_main()  ???:0
 5 0x000000000040bfaa _start()  /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================

line 639 reads: if (shd%NEXT(shd%RNKGRD(y, x)) > 0) then so it seems the subscripts are out of bounds - sounds weird as you have not changed the dimensions of the ddb.

For the initial conditions, the model read them without errors; Remains to see that it read the correct ones to be sure.

I don't see anywhere where RNKGRD is allocated/assigned anymore. That error is coming up because it's likely not allocated anywhere. I'm surprised if you don't also get this error without SUBBASINFLAG on with this code. Can you confirm? If it's related to SUBBASINFLAG specifically, we can address it here. If you find it still occurs with this code, without SUBBASINFLAG on, make it a new issue.

mee067 commented 1 year ago

OK, I will check without SUBBASINFLAG and see whether it is related or not. I just tested it with it trying to get it to print the number of tiles to see whether they changed after the masking was done.

mee067 commented 1 year ago

As you suspected, the "to_map" issue is not dependent on SUBBASINFLAG. I guess the SUBBASINFLAG changes are now working and this issue can be closed. I may post here for time/speed assessments when using the flag vs when really clipping the basin.