NOAA-EMC / CMEPS

NUOPC Community Mediator for Earth Prediction Systems
https://escomp.github.io/CMEPS/
0 stars 18 forks source link

Update CMEPS #110

Closed uturuncoglu closed 6 months ago

uturuncoglu commented 6 months ago

Description of changes

This PR aims to update CMPES with the upstream (ESCOMP) to bring recent developments such as CDEPS inline capability to UFS Weather Model.

Specific notes

Contributors other than yourself, if any: @DeniseWorthen @danrosen25 @binli2337 @BinLiu-NOAA

CMEPS Issues Fixed (include github issue #): N/A

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) No

Any User Interface Changes (namelist or namelist defaults changes)?

The user need to add new phase (med_phases_cdeps_run) to the run sequence to activate the feature. Then, needs to provide stream.config configuration file.

Additional changes also made to allow filling exchange fields with all data in the first coupling time step. I introduce set of new mediator namelist option to make it configurable. So, for example I am setting following in the nems.configure for ocean and wave (in mediator section),

  ocn_use_data_first_import = .true.
  wav_use_data_first_import = .true.

By default the values are .false. and it is only usable when CDEPS Inline feature activated. I also modify the wave and ocean prep phases to get the data and pass it to the components.

Testing performed

Please describe the tests along with the target model and machine(s) If possible, please also added hashes that were used in the testing Both CESM (Derecho, Intel) and UFS Weather Model (Orion Intel and WCOSS2) head of develop tested on multiple machines.

uturuncoglu commented 6 months ago

@DeniseWorthen This is replacement of https://github.com/NOAA-EMC/CMEPS/pull/109 and includes changes from https://github.com/NOAA-EMC/CMEPS/pull/107

binli2337 commented 6 months ago

After analyzing mediator history output, I think there might be some issues in the mediator.

My test run is at /scratch1/NCEPDEV/stmp2/Bin.Li/FV3_RT/20240404_check_output/use_pr_2028/rt_73559/hafs_regional_storm_following_1nest_atm_ocn_wav_mom6_intel. The model is from the latest feature/cdeps_inline_new branch.

If you try to plot the ATMEXP_SO_U field from "ufs.hafs.cpl.hi.2020-08-25-54000.nc", you will notice the problem. The issue is that when the inline CDEPS cannot provide valid values in the non-overlapped regions,a fill value (9.99E20) should be used. But the the ATMEXP_SO_U field indicates that the fill value is not applied in the non-overlapped regions.

I have transferred the ufs.hafs.cpl.hi.2020-08-25-54000.nc file to orion at /work/noaa/stmp/libin/CHECK_OUTPUT.

binli2337 commented 6 months ago

@BinLiu-NOAA @uturuncoglu @DeniseWorthen Some issues were found in the mediator. See comment above.

DeniseWorthen commented 6 months ago

@binli2337 Do you agree that the atmexp_so_v field appears correct?

Screenshot 2024-02-04 at 10 34 24 AM
binli2337 commented 6 months ago

@DeniseWorthen Yes, it looks correct.

BinLiu-NOAA commented 6 months ago

@DeniseWorthen and @binli2337, since we did not turn on the SSC feedback on atmosphere-ocean coupling (when calculating air-sea fluxes) in the newly added regression tests, I think we can let the PR go in first.

Another kind of note, with @binli2337's suggestion, I conducted a test with So_u and So_v fields being provided from the inline CDEPS input stream file (e.g., adding UGRD_surface and VGRD_surface with zero values). In this case, the SSC (usfco and vsfco) fields in the FV3ATM side are also in the correct range.

Thanks!

DeniseWorthen commented 6 months ago

@BinLiu-NOAA OK, it's your call. But something still seems weird if the so_v fields appear correct but the so_u fields do not.

BinLiu-NOAA commented 6 months ago

@BinLiu-NOAA OK, it's your call. But something still seems weird if the so_v fields appear correct but the so_u fields do not.

@DeniseWorthen, I think I understand your point now. Basically, So_v has no issues (even though without the corresponding inline-CDEPS input stream field), but So_u has an issue. In this case, it seems to me there might be a typo somewhere in the code.

DeniseWorthen commented 6 months ago

@BinLiu-NOAA I agree--the most likely culprit is a typo somewhere.

uturuncoglu commented 6 months ago

@BinLiu-NOAA @DeniseWorthen Let me check it tonight. I could not find chance to access the folder but if you could double check the permissions that would be great.

DeniseWorthen commented 6 months ago

@uturuncoglu Actually, I think it has more to do w/ what happens in med_map_mod when a matching field is not found. What I tried was this

diff --git a/mediator/med_map_mod.F90 b/mediator/med_map_mod.F90
index bcf178fb..aa3ee099 100644
--- a/mediator/med_map_mod.F90
+++ b/mediator/med_map_mod.F90
@@ -968,6 +968,7 @@ contains
     character(cl)                 :: field_name
     character(cl), allocatable    :: field_namelist_dat(:)
     logical                       :: skip_mapping
+    logical                       :: isFound
     type(ESMF_Region_Flag)        :: zeroregion
     real(ESMF_KIND_R8), parameter :: fillValue = 9.99e20_ESMF_KIND_R8
     character(len=*), parameter   :: subname=' (med_map_mod:med_map_field_packed) '
@@ -1078,6 +1079,7 @@ contains

                 ! Loop over fields and fill it if there is a match
                 do nf = 1,fieldcount
+                   isFound = .false.
                    ! Get the indices into the packed data structure
                    np = packed_data(mapindex)%fldindex(nf)
                    if (np > 0) then
@@ -1119,11 +1121,20 @@ contains
                                call Field_diagnose(packed_data(mapindex)%field_dst, trim(field_name), " --> after  background fill: ", rc=rc)
                                if (chkerr(rc,__LINE__,u_FILE_u)) return
                             end if
-
+                            isFound = .true.
                             ! Exit from loop since match is already found
                             exit
                          endif
                       end do
+                      if(.not. isFound) then
+                         call Field_diagnose(packed_data(mapindex)%field_dst, trim(field_name), " --> not found field : ", rc=rc)
+                         if (chkerr(rc,__LINE__,u_FILE_u)) return
+                         call ESMF_FieldFill(packed_data(mapindex)%field_dst, dataFillScheme="const", const1=fillValue, rc=rc)
+                         !call ESMF_FieldFill(packed_data(mapindex)%field_dst, dataFillScheme="const", const1=0.0_R8, rc=rc)
+                         if (chkerr(rc,__LINE__,u_FILE_u)) return
+                         call Field_diagnose(packed_data(mapindex)%field_dst, trim(field_name), " --> not found field after : ", rc=rc)
+                         if (chkerr(rc,__LINE__,u_FILE_u)) return
+                      end if
                    end if
                 end do

I have a run directory on orion /work/noaa/stmp/dworthen/stmp/dworthen/FV3_RT/rt_397710/test2

uturuncoglu commented 6 months ago

I think these fields are new and first appear after last PR that uses ocean current in the atmosphere model. Right? @DeniseWorthen Thanks for the fix but I am not sure why one field is found but other not. I think that in our case both of them missing in the stream file and if there is an issue it must have in both. Any idea?

uturuncoglu commented 6 months ago

@BinLiu-NOAA Could you point your run directory for the run that has issue? I would like to see mediator.log. I think, we need to put @DeniseWorthen fix to CMEPS since it covers some cases that could have issue (having filed in data FB but not the fields that we are looking for).

BinLiu-NOAA commented 6 months ago

@BinLiu-NOAA Could you point your run directory for the run that has issue? I would like to see mediator.log. I think, we need to put @DeniseWorthen fix to CMEPS since it covers some cases that could have issue (having filed in data FB but not the fields that we are looking for).

Thanks @uturuncoglu and @DeniseWorthen! Let me set up and run the regression test case on Orion and point you to those test dirs. Will run with and without the fixes.

DeniseWorthen commented 6 months ago

@uturuncoglu When I turned on FieldDiagnose, I could see that So_u was actually getting values that were the same as So_t (high value was ~303K).

BinLiu-NOAA commented 6 months ago

@uturuncoglu, @DeniseWorthen, and @binli2337, as I mentioned in a separate email, the hafs_regional_storm_following_1nest_atm_ocn_wav_mom6_intel RT out of box from the PR branch (except I did modify the corresponding diag_table to output usfco/vsfco in FV3ATM) can be found on Orion: /work/noaa/stmp/bliu/stmp/bliu/FV3_RT/rt_117317/hafs_regional_storm_following_1nest_atm_ocn_wav_mom6_intel

Meanwhile, the same hafs_regional_storm_following_1nest_atm_ocn_wav_mom6_intel RT dir after the bug fixes from @DeniseWorthen (for isFound logic) and from @binli2337 (for the polemethod fix) can be found in the following location on Orion: /work/noaa/stmp/bliu/stmp/bliu/FV3_RT/rt_127647/hafs_regional_storm_following_1nest_atm_ocn_wav_mom6_intel

It looks to me after the above fixes, both SST and SSC are working properly now.

Thanks!

uturuncoglu commented 6 months ago

@BinLiu-NOAA I think there is a masking issue in the fixed case. If I look at the So_t on atmosphere grid I am only seeing data over active coupling region and it seems that CDEPS inline does not complete the So_t. This is correct in the original run. If you look at your stream.config file, you are providing So_t as a stream and as it expected original implementation (first run without any fix) completes the field and correct. The later run does not have it (just fills large values) and I think it is wrong.

uturuncoglu commented 6 months ago

@BinLiu-NOAA @DeniseWorthen I'll look at the original run and see what is going on wrong in there. I'll update you when I have something new.

jkbk2004 commented 6 months ago

All test are done at https://github.com/ufs-community/ufs-weather-model/pull/2028. @DeniseWorthen can you merge this pr?

DeniseWorthen commented 6 months ago

@jkbk2004 Testing over the weekend uncovered a couple of issues related to this PR and the UWM. Since these features are intended for hafs v2 implentation, there is discussion about how to deal w/ this. @BinLiu-NOAA Would you please chime in here as to your preference?

jkbk2004 commented 6 months ago

@jkbk2004 Testing over the weekend uncovered a couple of issues related to this PR and the UWM. Since these features are intended for hafs v2 implentation, there is discussion about how to deal w/ this. @BinLiu-NOAA Would you please chime in here as to your preference?

What was issue?

DeniseWorthen commented 6 months ago

There are fixes needed for the CMEPS implementation of the inline-CDEPS feature.

BinLiu-NOAA commented 6 months ago

All test are done at ufs-community/ufs-weather-model#2028. @DeniseWorthen can you merge this pr?

@jkbk2004, as you may notice from the above conversations, we (@uturuncoglu, @DeniseWorthen, @binli2337 and myself) found and fixed two bugs from this CMEPS side. The fixes improve the treatment of So_u and So_v when CMEPS-inline-CDEPS does not have those fields in the input stream file, also set the polemethod = ESMF_POLEMETHOD_NONE for the hafs.mom6 coupling mode to ensure the regional MOM6 ocean coupling variables being properly remapped to the FV3ATM grid (without strange extrapolation).

Unfortunately, the ufs-weather-model side regression tests are already done for all supported platforms without these additional CMEPS fixes (which were figured out, tested and verified not until this morning). Meanwhile fortunately, the PR has not yet been merged.

In this situation, I am not sure what is the typical approach to move forward: letting the PR go in first as is and creating a follow-up PR to fix the issues, or updating the PR with the CMEPS bug fixes and rerunning all the RTs. Appreciate it if you and other code/repo admins can provide some suggestions/recommendations here.

Thanks!

jkbk2004 commented 6 months ago

Can we move on merging and applying another new bug fix pr? CDEPS pr https://github.com/NOAA-EMC/CDEPS/pull/60#event-11704631687 already got merged.

binli2337 commented 6 months ago

@jkbk2004 The CMEPS fix is not related to the CDEPS pr.

BinLiu-NOAA commented 6 months ago

@jkbk2004 and @DeniseWorthen, I personally prefer the second approach (updating the PR with the CMEPS bug fixes and rerunning all the RTs). But I understand it's more up to you (as ufs-weather-model repo Admin and CMEPS code/reop Admin) to choose/decide which approach to proceed in this special scenario. Thanks!

DeniseWorthen commented 6 months ago

@BinLiu-NOAA If we merge the current PR, then I assume we're going to just need to turn right around and create a second PR to make the fixes? Those fixes will change the baselines for the same two tests (we need to confirm this is in fact the case). So we either re-create the two baselines now and run against those new ones, or we make a second hotfix PR. At that point, we'll have to create an entire new baseline because the tests will have been added and the fixes will change those baselines.

DeniseWorthen commented 6 months ago

Remember also that Hera is on maintenance tomorrow.

jkbk2004 commented 6 months ago

I prefer to move on with merging this pr and applying a new hotfix pr. Then, we can combine with other independent PRs to avoid commit schedule jamming situation.

BinLiu-NOAA commented 6 months ago

I prefer to move on with merging this pr and applying a new hotfix pr. Then, we can combine with other independent PRs to avoid commit schedule jamming situation.

Thanks, @jkbk2004! The idea of potential combining the hot fix with other independent PR(s) sounds good to me. So, if @DeniseWorthen also agrees, it sounds reasonable to me to go with this approach. Thanks!

DeniseWorthen commented 6 months ago

I think it will be faster to re-create the two needed baselines and run against them now for this PR, given that Hera will be down tomorrow. Unless the PRs are combined and ready to run today, that means we won't be able to commit to UWM until Wednesday. @BinLiu-NOAA Does that work for the hafs v2 schedule?

BinLiu-NOAA commented 6 months ago

@jkbk2004 do we have another ufs-weather-model level PR to combined with today for this potential CMEPS hot fix PR (if choosing to merging first and hotfix after)?

@DeniseWorthen, for HAFSv2 code freeze, we definitely would like to have the ufs-weather-model side ready as early as possible. If the CMEPS hot fix is not in the develop branch, we will try to point to the CMEPS feature branch in that case (not ideal though). Thanks!

jkbk2004 commented 6 months ago

@binli2337 we can combine with https://github.com/ufs-community/ufs-weather-model/pull/2115 and https://github.com/ufs-community/ufs-weather-model/pull/2111. These are at the weather model level and no baseline changes. @uturuncoglu can you apply a quick fix today?

DeniseWorthen commented 6 months ago

Again, my concern is a) hera downtime tomorrow and b) the time it takes to get a combined pr tested and ready. Remember, once we combine them, we need to run just to confirm that the combined PRs have the intended impacts on the tests.

BrianCurtis-NOAA commented 6 months ago

It is my recommendation to bring the hotfix into this PR, and get Hera started immediately to beat the downtime tomorrow. I personally agree with Denise, and to me it makes better sense due to the high importance of this work towards a HAFS deadline. What I would hope from HAFS is that bugs like this are found before it makes it to the UFSWM so we don't need such a hotfix in the future.

jkbk2004 commented 6 months ago

@DeniseWorthen re-test is inevitable once we receive new commit.

DeniseWorthen commented 6 months ago

The hotfix will require re-creating baselines for the two new tests only. That should be relatively quick on hera, and then hera RTs can kick off today.

jkbk2004 commented 6 months ago

@uturuncoglu @BinLiu-NOAA please, move faster to apply the fix. @DeniseWorthen @BrianCurtis-NOAA Sanity check can be done with faster machine.

BinLiu-NOAA commented 6 months ago

Thanks @jkbk2004 @BrianCurtis-NOAA and @DeniseWorthen, I am updating this PR with the bug fixes now. Much appreciated.

BrianCurtis-NOAA commented 6 months ago

@uturuncoglu @BinLiu-NOAA please, move faster to apply the fix. @DeniseWorthen @BrianCurtis-NOAA Sanity check can be done with faster machine.

Don't worry about sanity check. Just run baselines and we can address if we start seeing failures.

uturuncoglu commented 6 months ago

Sorry all I could not respond to you about recent discussion due to time difference. I think we already made a decision by applying the fixes for this PR. Anyway, let me know if you need anything from my side. Thanks again all for your help.

FernandoAndrade-NOAA commented 6 months ago

The RT rerun on #2028 has completed successfully, please proceed with the merge process.

BinLiu-NOAA commented 6 months ago

The RT rerun on #2028 has completed successfully, please proceed with the merge process.

Thanks @FernandoAndrade-NOAA as well as all the code/repo Admins for the speedy RT processing! Much appreciated! @DeniseWorthen looks like this PR is ready to merge probably. Thanks!