ESCOMP / mizuRoute

Reach-based river routing model
http://escomp.github.io/mizuRoute/
GNU General Public License v3.0
44 stars 53 forks source link

CTSM runoff remapping issue again #426

Open nmizukami opened 1 year ago

nmizukami commented 1 year ago

We found incorrect runoff mapping from LND to ROF again. This powerpoint shows the problem cmep_history.pptx

What we did:

What we found:

Additional information:

mizuRoute uses pre-computed mapping file, not internally computed using ROF mesh and LND mesh.

LND2ROF_FMAPNAME: /glade/work/mizukami/py_mapping/mapping_conversion/data/map_fv0.9x1.25_TO_HDMA-lake_cdf5.nc
ROF2LND_FMAPNAME: /glade/work/mizukami/py_mapping/mapping_conversion/data/map_HDMA-lake_TO_fv0.9x1.25_cdf5.nc
ekluzek commented 1 year ago

@nmizukami there's a call in prep_phase for ROF in CMEPS but it's only for the irrigation field. So I don't think that can be the problem.

There's also a single call to normalize in med_map_mod.F90 and I think that's the one we should comment out. This means the normalization will be off for all fields passed, but I would expect inland points to be fine. It'll be helpful to see if this makes inland points for mizuRoute look fine as well.

Here's the git grep for that call..

med_map_mod.F90: call med_map_field_normalized(&

nmizukami commented 1 year ago

MOSART land fraction seems to be correct

Screen Shot 2023-08-30 at 1 23 30 PM
nmizukami commented 1 year ago

Hi Erik (@ekluzek),

Looks like commenting out call med_map_field_normalized produced NaN in exporting runoff. So I am not sure if can just comment out this line.

/glade/scratch/mizukami/ctsm-mizuRoute/test_f09_f09_rHDMAlk_irr/run/cesm.log.2960367.chadmin1.ib0.cheyenne.ucar.edu.230830-171351

Maybe wondering if it is possible to set lrfaction to 1 everywhere?

ekluzek commented 1 year ago

OK, giving you an untested change, didn't work out! I shouldn't have been surprised at that. And the solution is much more complicated, but I have a branch on CMEPS now with things I'm experimenting with that hopefully will help us diagnose the problem.

ekluzek commented 1 year ago

Note, that frac_a and frac_b on the mapping files are both identically 1. So the lfrac must be computed based on mapping between the ocean-mask file and mizuRoute. Possibly this might be the problem. We might need to manage that process somehow...

nmizukami commented 1 year ago

So here is ppt also including hcru_hcru_mt13, which runs mizuRoute using MOSART 0.5 degree network. Since this network is grid box, so online mapping (ESMF compute weight using ROF and LND mesh) can be done as well as offline mapping (still input weight mapping netcdf generated using my standalone tool).

cmep_history.pptx

Sum of runoff over the global domain online mapping: 0.610361323170621 mm/s offline mapping: 0.6112571762578796 mm/s

ekluzek commented 1 year ago

In looking at this we think the issue is that there is a mapping between the ocean grid/mask to the mizuRoute grid. The problem there is that mapping is from a complex ocean grid to the mizuRoute complex grid. As we have setup the model now the mizuRoute is expressed as a set of small squares around the center points of the complex HRU's for mizuRoute. Thus we can't really expect the mapping to be done correctly.

However, we also can't easily create a mapping file with the script that @nmizukami to map between the CTSM regular grid and the mizuRoute grid. Because of the previous problems with having ESMF read the mizuRoute grid, we also don't think we can expect to have ESMF do the mapping between the ocean grid and mizuRoute to determine the land-frac on the mizuRoute grid.

We thought of a few options that might work:

  1. Remove the normalization using lfrac on the ROF grid (this will need to be tested to see if this is even a good idea)
  2. Read in and use a lfrac on the ROF grid that we compute by some other means than mapping.
  3. For the ocean mask, use the computed lfrac/mask (from the mapping of the ocean mask to the CTSM grid) on the CTSM grid rather than from the ocean grid
  4. Map lfrac on the CTSM grid (from mapping of ocean mask to the CTSM grid) to the mizuRoute grid, then use the mapped lfrac for the fraction
  5. Work on creating a simpler grid for mizuRoute so that we can use ESMF online regridding for it

The 5th option may sound easy (and in the long run may still be the right thing to do), but it is a highly intensive operation for someone to simplify the HRU polygons and ensure that the simplified version still fits together exactly. Some work has been done on this and tools to help with this don't ensure that the polygons still fit together exactly. So an intensive operation has to be done to fix each polygon that overlaps another, or leaves free space. Since a lot of this has to be done by hand, it may take weeks to months of painstaking work to get something that will actually work.

ekluzek commented 1 year ago

I also think a lot of the underlying issue here is #301.

ekluzek commented 1 year ago

@billsacks suggests one way to do option 1, above is to comment out a section of code in CMEPS here...

https://github.com/ESCOMP/CMEPS/blob/abaef5fd4ccf7591d4912a0a907bd5674e18fd3e/mediator/med_fraction_mod.F90#L510-L524

billsacks commented 1 year ago

Thanks for laying out all of the options above. To clarify my comment: yes, this is the block of code that I'm guessing creates the bad mapping you're seeing, but I suspect you'll run into problems if you remove it... probably the model won't run at all, and if it does, I think you'll be breaking conservation. If you just want to get things running, you could remove that and do something like hard-coding lfrac on the rof grid, but we probably need a different long-term solution.

Also note that, in that block of code in med_fraction_mod, it's actually mapping from the lnd grid to the rof grid. Your above comment talks about the problem being a mapping from the ocn grid to the rof grid. I'm not sure if that changes your analysis, or means that I didn't identify the right code that's causing the problem.

ekluzek commented 1 year ago

Hi Bill. Thanks for the discussion this is all helpful.

I'm unconvinced the problem is based on the mapping from LND to ROF -- because in that case the mapping should be coming from the mapping file we gave it. And we do think that part is working fine.

Naoki also ran on the MOSART grid for mizuRoute and showed that the lfrac on the ROF grid in that case looks as expected both for regridding with a mapping file and with regridding on the fly. Which seems to show that both using the mapping file and the online regridding function correctly in terms of handling lfrac. It also verifies that Naoki's mapping file compares well with the ESMF online regridding. I was wondering if there was a problem with our mapping file for the complex grids and this shows that the process works well for the simple grid. And we wanted to make sure that lfrac on the ROF grid looked as expected for the simpler case.

So I'm still thinking there must be a mapping from the ocean mask to the ROF grid and that must be where lfrac on the ROF grid comes from.....

Does that sound right to you?

billsacks commented 1 year ago

I dug into this more. I can't find any mappings from ocn to rof (based on a search for 'ocn.*rof'), though it's possible that I'm missing something with my search.

It sounds like the problem you're seeing is with Med_frac_rof_lfrac. This comes from is_local%wrap%FBFrac(compid) for rof. This is what's set in med_fraction_init, so I still think that's where the problem originates. And again, this fraction is used in the "merge" (not in the mapping normalization).

By adding these extra prints to the code in a case with MOSART (I couldn't get things to run with mizuRoute so just tried with MOSART):

diff --git a/mediator/med.F90 b/mediator/med.F90
index e7c6da9d..f17b393d 100644
--- a/mediator/med.F90
+++ b/mediator/med.F90
@@ -1800,6 +1800,7 @@ subroutine DataInitialize(gcomp, rc)
       ! Initialize route handles and required normalization field bunds
       !---------------------------------------
       call ESMF_LogWrite("before med_map_RouteHandles_init", ESMF_LOGMSG_INFO)
+      print *, 'WJS: about to call med_map_RouteHandles_init'
       call med_map_RouteHandles_init(gcomp, is_local%wrap%flds_scalar_name, logunit, rc)
       if (ChkErr(rc,__LINE__,u_FILE_u)) return
       call ESMF_LogWrite("after  med_map_RouteHandles_init", ESMF_LOGMSG_INFO)
diff --git a/mediator/med_fraction_mod.F90 b/mediator/med_fraction_mod.F90
index 2fd83972..272c5e5a 100644
--- a/mediator/med_fraction_mod.F90
+++ b/mediator/med_fraction_mod.F90
@@ -507,13 +507,18 @@ subroutine med_fraction_init(gcomp, rc)

        ! Set 'lfrac' in FBFrac(comprof)
        if (is_local%wrap%comp_present(complnd)) then
+          print *, 'WJS: setting lfrac for comprof'
           maptype = mapconsd
+          print *, 'WJS: checking for RH: ', complnd, comprof, maptype
           if (.not. med_map_RH_is_created(is_local%wrap%RH(complnd,comprof,:),maptype, rc=rc)) then
+             print *, 'WJS: need to create RH'
              call med_map_routehandles_init( complnd, comprof, &
                   FBSrc=is_local%wrap%FBImp(complnd,complnd), &
                   FBDst=is_local%wrap%FBImp(complnd,comprof), &
                   mapindex=maptype, RouteHandle=is_local%wrap%RH, rc=rc)
              if (ChkErr(rc,__LINE__,u_FILE_u)) return
+          else
+             print *, 'WJS: RH was already created'
           end if
           call ESMF_FieldBundleGet(is_local%wrap%FBfrac(complnd), 'lfrac', field=field_src, rc=rc)
           if (ChkErr(rc,__LINE__,u_FILE_u)) return
diff --git a/mediator/med_map_mod.F90 b/mediator/med_map_mod.F90
index 18752dc2..096ea2a0 100644
--- a/mediator/med_map_mod.F90
+++ b/mediator/med_map_mod.F90
@@ -167,6 +167,7 @@ subroutine med_map_RouteHandles_initfrom_esmflds(gcomp, flds_scalar_name, llogun
                    if (mapindex /= mapunset) then

                       ! determine if route handle has already been created
+                      print *, 'WJS: in med_map_RouteHandles_initfrom_esmflds, checking mapexists: ', n1, n2, mapindex
                       mapexists = med_map_RH_is_created(is_local%wrap%RH,n1,n2,mapindex,rc=rc)
                       if (chkerr(rc,__LINE__,u_FILE_u)) return

I think I understand better what's going on. In med_fraction_mod, the mappings are done with mapconsd (ESMF_NORMTYPE_DSTAREA). The other mappings all use mapconsf (ESMF_NORMTYPE_FRACAREA). Those other mappings are what I think you're providing in your pre-computed mapping files. Then the code gets into med_fraction_mod, asks if there is a mapconsd mapping from lnd -> rof, finds that the answer is no (because it only has a mapconsf mapping), and so creates a new lnd -> rof mapconsd mapping at runtime.

These different normalization types are described here:

https://earthsystemmodeling.org/docs/nightly/develop/ESMF_refdoc/node5.html#sec:interpolation:conservative_norm_opts

But I haven't come to understand this.

As a bit of an aside, but possibly relevant, I dug up an email from @mvertens from 3 years ago in the context of the lnd -> glc mapping where I asked about her use of mapconsf vs. mapconsd and she said:

Actually, everything should be mapconsd (long story - I spent a week with Bob and Denise trying to sort this out). For CESM it is only round of level changes at most - but I am trying to add this back in incrementally as we move forwards and test. I can explain the difference between mapconsf and mapconsd if you are interested - so let me know.

But it seems like at least the lnd <-> rof mappings are still being done with mapconsf. I'm not sure if that's intentional. If the correct thing to do is to change all of these mappings to mapconsd, then I think it would resolve this issue because the mappings from the external mapping file would be taken as mapconsd, which would be consistent with the mapping of lfrac in med_fraction_mod.

ekluzek commented 1 year ago

Awesome, thanks for tracking that down. We should be able to easily test if by using mapconsf mapping it resolves the issue with lfrac on the ROF grid for mizuRoute. As you and Mariana point out maybe it's really intended to always be mapconsf anyway. But, if that fixes it, we can at least add an option for it that we can turn on for mizuRoute cases. But, if you are right that it really should always be mapconsf, then we could make it always do that, and make sure it works for other ROF grids.

billsacks commented 1 year ago

To clarify: Mariana's message suggested that everything should be mapconsd. But your idea is still a good one for a test: change mapconsd to mapconsf for now in med_fraction_mod for the lnd -> rof mapping of lfrac and see if that resolves this issue. That may not be the right long-term solution, but it would be a good thing to try.

billsacks commented 1 year ago

@ekluzek and @nmizukami - I talked about this with @mvertens today. We can talk more tomorrow, but we agreed that a good first step would be to try the change suggested above – changing mapconsd to mapconsf for now in med_fraction_mod for the lnd -> rof mapping of lfrac and see if that resolves this issue. That probably isn't the right long-term solution, but in terms of confirming our understanding of what's going on, it would be very helpful to have that test done. Would you be able to do that before our meeting tomorrow?

ekluzek commented 1 year ago

OK I ran a test that seems to work with results here:

/glade/scratch/erik/SMS_D_Ld1.f09_f09_rHDMAlk_mg17.I2000Clm50SpMizGs.cheyenne_intel.mizuroute-default.20230905_104457_4b4dur/run/

The core code changes to med_fraction_mod.F90 look like this:

@@ -508,7 +510,7 @@ subroutine med_fraction_init(gcomp, rc)

        ! Set 'lfrac' in FBFrac(comprof)
        if (is_local%wrap%comp_present(complnd)) then
-          maptype = mapconsd
+          maptype = mapconsf
           if (.not. med_map_RH_is_created(is_local%wrap%RH(complnd,comprof,:),maptype, rc=rc)) then
              call med_map_routehandles_init( complnd, comprof, &
                   FBSrc=is_local%wrap%FBImp(complnd,complnd), &
@@ -518,10 +520,21 @@ subroutine med_fraction_init(gcomp, rc)
           end if
           call ESMF_FieldBundleGet(is_local%wrap%FBfrac(complnd), 'lfrac', field=field_src, rc=rc)
           if (ChkErr(rc,__LINE__,u_FILE_u)) return
+          call field_getdata1d(field_src, lfrac, rc)
+          if (ChkErr(rc,__LINE__,u_FILE_u)) return
+          if ( (minval(lfrac) < -eps_fraclim) .or. (maxval(lfrac) > 1._r8+eps_fraclim) )then
+              call shr_sys_abort( "lfrac on the LND grid is out of bounds" )
+          end if
           call ESMF_FieldBundleGet(is_local%wrap%FBfrac(comprof), 'lfrac', field=field_dst, rc=rc)
           if (ChkErr(rc,__LINE__,u_FILE_u)) return
           call med_map_field(field_src, field_dst, is_local%wrap%RH(complnd,comprof,:), maptype, rc=rc)
           if (ChkErr(rc,__LINE__,u_FILE_u)) return
+          call field_getdata1d(field_dst, lfrac, rc)
+          if (ChkErr(rc,__LINE__,u_FILE_u)) return
+          if ( (minval(lfrac) < -eps_fraclim) .or. (maxval(lfrac) > 1._r8+eps_fraclim) )then
+              write(logunit,*) 'minval, maxval of lfrac = ', minval(lfrac), maxval(lfrac)
+              call shr_sys_abort( "lfrac on the ROF grid is out of bounds" )
+          end if
        endif
     endif

The reason I say I think it's working is that as you see above I check if the lfrac on ROF and LND grids are out of range and abort if they do. The case before the change to mapconsf that check fails on the ROF grid, but once I set it to mapconsf it PASSes. And I see that the ROF lfrac is within bounds on the CPL hist file.

nmizukami commented 1 year ago

Hi @ekluzek and @billsacks,

I mapped the Erik's run including the above change (mapconsd -> mapconsf). I plotted Med_frac_rof_lfrac in SMS_D_Ld1.f09_f09_rHDMAlk_mg17.I2000Clm50SpMizGs.cheyenne_intel.mizuroute-default.20230905_104457_4b4dur.cpl.hi.2000-01-02-00000.nc, but looks like it is the same as the previous run.

Screen Shot 2023-09-19 at 6 33 41 AM

Only difference is Erik's run does not have >1 Med_frac_rof_lfrac.

nmizukami commented 1 year ago

I also actually changed med_fraction_mod.F90 and ran yesterday afternoon. My run including the med_fraction_mod change actually looks good. I put the modified med_fraction_mod.F90 under SourceMod/src.drv, then recompile before run.

My case is /glade/u/home/mizukami/proj/ctsm-mizuRoute/run_case/test_f09_f09_rHDMAlk_irr archive is /glade/scratch/mizukami/ctsm-mizuRoute/archive/test_f09_f09_rHDMAlk_irr/cpl/hist

Screen Shot 2023-09-19 at 6 54 36 AM

ekluzek commented 1 year ago

OK, so in your case you just changed the map type without the other changes I had to check the lfrac range. The only thing I can think of is that maybe the lfrac pointer I got was somehow corrupting the values. A simple change I could make is to nullify the pointer after I do the check. That is of course a good practice to do anyway. And maybe I need to leave it in the state where lfrac is set to the LND version of lfrac, as maybe it needs that later on...