ESCOMP / CMEPS

NUOPC Community Mediator for Earth Prediction Systems
https://escomp.github.io/CMEPS/
24 stars 79 forks source link

Enable CCPP host model under CMEPS and updates for UFS exchange grid capability #282

Closed uturuncoglu closed 2 years ago

uturuncoglu commented 2 years ago

Description of changes

This PR aims to bring exchange grid capability to UFS weather model. For this purpose, CMEPS is extended to act as a CCPP host model to calculate atmosphere-ocean fluxes by running CCPP suite files. This feature is currently only available for UFS model.

Specific notes

Contributors other than yourself, if any: None

CMEPS Issues Fixed (include github issue #):

Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) No

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed

Testing performed if application target is CESM:

Testing performed if application target is UFS-coupled:

Testing performed if application target is UFS-HAFS:

Hashes used for testing:

uturuncoglu commented 2 years ago

@jedwards4b this is ready to review. I could do initial CESM specific tests.

jedwards4b commented 2 years ago

@uturuncoglu you need to merge master into your pr and push that back

uturuncoglu commented 2 years ago

@jedwards4b i am planing to run test but I am not sure which baseline needs to be compared with alpha09a. There are two under /glade/p/cesmdata/cseg/cmeps_baselines. cesm2_3_alpha07c_cmeps0.13.43 and cesm2_3_alpha07c_cmeps0.13.44. Any idea?

jedwards4b commented 2 years ago

@uturuncoglu can you merge master into your branch and push it back first so that the github workflow passes? Try against cesm2_3_alpha07c_cmeps0.13.44. But I suspect you'll see some differences.

uturuncoglu commented 2 years ago

@jedwards4b okay. I updated the code with recent workflow fix and now it builds. Thanks for the fix.

uturuncoglu commented 2 years ago

@jedwards4b Here is the result of CESM tests (failed ones),

turuncu@cheyenne4/glade/scratch/turuncu $ ./cs.status.20220505_160855_mucy3k | grep FAIL
    FAIL ERP_Vnuopc_Ln9.f09_f09_mg17.F2000climo.cheyenne_intel.cam-outfrq9s NLCOMP
    FAIL ERP_Vnuopc_Ln9.f09_f09_mg17.F2000climo.cheyenne_intel.cam-outfrq9s BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERR_Vnuopc_Ld5.f09_t061.B1850MOM.cheyenne_intel.allactive-defaultio NLCOMP
    FAIL ERR_Vnuopc_Ld5.f09_t061.B1850MOM.cheyenne_intel.allactive-defaultio BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
  ERS_Ly3_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel (Overall: NLFAIL) details:
    FAIL ERS_Ly3_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.f09_g17.I1850Clm50Sp.cheyenne_intel.clm-default NLCOMP
    FAIL ERS_Vnuopc_Ld5.f09_g17.I1850Clm50Sp.cheyenne_intel.clm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.f19_g17.B1850.cheyenne_intel.allactive-defaultio NLCOMP
    FAIL ERS_Vnuopc_Ld5.f19_g17.B1850.cheyenne_intel.allactive-defaultio BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.f19_g17.I2000Clm51Bgc.cheyenne_intel.clm-default NLCOMP
    FAIL ERS_Vnuopc_Ld5.f19_g17.I2000Clm51Bgc.cheyenne_intel.clm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.T62_g17.C.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.T62_g17.C.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.T62_g17.G.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.T62_g17.G.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.T62_g37.DTEST.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.T62_g37.DTEST.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Lm13.f10_f10_mg37.I1850Clm50SpG.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Lm13.f10_f10_mg37.I1850Clm50SpG.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL ERS_Vnuopc_Ln5.ne16_ne16_mg17.QPC4.cheyenne_intel.cam-nuopc_cap NLCOMP
    FAIL ERS_Vnuopc_Ln5.ne16_ne16_mg17.QPC4.cheyenne_intel.cam-nuopc_cap BASELINE cesm2_3_alpha07c_cmeps0.13.44: FIELDLIST field lists differ (otherwise bit-for-bit)
  ERS_Vnuopc_Ln9_C3.f19_g17_rx1.A.cheyenne_intel (Overall: NLFAIL) details:
    FAIL ERS_Vnuopc_Ln9_C3.f19_g17_rx1.A.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ln9.f19_g17.X.cheyenne_intel NLCOMP
    FAIL ERS_Vnuopc_Ln9.f19_g17.X.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL SMS_D_Ld5_Vnuopc.f10_f10_mg37.I2000Clm50BgcCropRtm.cheyenne_intel.rtm-default NLCOMP
    FAIL SMS_D_Ld5_Vnuopc.f10_f10_mg37.I2000Clm50BgcCropRtm.cheyenne_intel.rtm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
  SMS_D_Ly1_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel (Overall: NLFAIL) details:
    FAIL SMS_D_Ly1_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel NLCOMP
    FAIL SMS_Vnuopc.f19_g17.X.cheyenne_intel NLCOMP
    FAIL SMS_Vnuopc.f19_g17.X.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
  SMS_Vnuopc_Ld2.ww3a.ADWAV.cheyenne_intel (Overall: NLFAIL) details:
    FAIL SMS_Vnuopc_Ld2.ww3a.ADWAV.cheyenne_intel NLCOMP
  SMS_Vnuopc_Ld3.f09_f09_mg17.A1850DLND.cheyenne_intel (Overall: NLFAIL) details:
    FAIL SMS_Vnuopc_Ld3.f09_f09_mg17.A1850DLND.cheyenne_intel NLCOMP
    FAIL SMS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel NLCOMP
    FAIL SMS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
    FAIL SMS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel NLCOMP
    FAIL SMS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
  SMS_Vnuopc_Ln11_D.f19_g17_rx1.A.cheyenne_intel (Overall: NLFAIL) details:
    FAIL SMS_Vnuopc_Ln11_D.f19_g17_rx1.A.cheyenne_intel NLCOMP

It mostly complains about namelist change which seems related with update version of CIME and CMEPS. The baseline is basically created for cesm2_3_alpha07c not cesm2_3_alpha09a. If you want I could also checkout cesm2_3_alpha07c and try again.

uturuncoglu commented 2 years ago

@jedwards4b I tried to use cesm2_3_alpha07c for test with newer version of CMEPS and CIME but I could not because it gives error like ERROR: Makes no sense to have empty read-only file: /glade/scratch/turuncu/CESM_282_alpha07c/ccs_config/machines/config_machines.xml. I think ccs_config is introduced later. I could copy it from cesm2_3_alpha09a tag but I am not sure that is the way that I need to follow.

jedwards4b commented 2 years ago

I think that the answer changes are expected and this PR is fine, @mvertens can verify.

uturuncoglu commented 2 years ago

@jedwards4b I also run scripts_regression_tests.py and I'll update you about that. I'll also performing longer runs under UFS for extra tests. @climbfuji do you want to review the PR again?

uturuncoglu commented 2 years ago

@jedwards4b The scripts_regression_tests.py seems fine. I could see the following at the end of the log,

----------------------------------------------------------------------
Ran 286 tests in 6879.794s

OK (skipped=87)
PASS test NLCOMP
PASS test BASELINE Detail comments

SKIP test NLCOMP Test did not make it to setup phase
SKIP test BASELINE Test did not make it to run phase

PASS test BASELINE Detail comments

PASS test NLCOMP

If you want to look at more detailed way, they are in /glade/scratch/turuncu/scripts_regression_test.20220505_234930

I also want to report an issue about the testreporter script generated under scripts_regression_test.*folder such as /glade/scratch/turuncu/scripts_regression_test.20220505_234930. The script points to /glade/scratch/turuncu/CESM_282/cime/testreporter.py but there is no such file and in fact the file is moved to /glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py. If if fix it I am also getting following error,

Traceback (most recent call last):
  File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 255, in <module>
    _main_func()
  File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 237, in _main_func
    testxml = get_testreporter_xml(testroot, testid, tagname, testtype)
  File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 58, in get_testreporter_xml
    os.chdir(testroot)
TypeError: chdir: path should be string, bytes, os.PathLike or integer, not NoneType

So, this makes hard to look at tests and their results.

jedwards4b commented 2 years ago

@uturuncoglu Can you create a new baseline using beta08 and then run tests against that?

uturuncoglu commented 2 years ago

@jedwards4b Sure. Is cesm2_3_beta08 uses newer version of the CMEPS that will prevent namelist changes? Are we still expecting those changes? I think it will only solve the baseline issue. Right?

uturuncoglu commented 2 years ago

@jedwards4b I created new baseline. It is in /glade/p/cesmdata/cseg/cmeps_baselines/cesm2_3_beta08_cmeps0.13.47. Next, I'll rerun updated model against it.

uturuncoglu commented 2 years ago

@jedwards4b BTW, please wait before merging this PR. I have couple of minor fix for UFS OpnReqTests. They will not affect the CESM.

uturuncoglu commented 2 years ago

@climbfuji the OpnReqTests threading test is failing with newly added RT. I am not sure about the source of it but answer changes when threading activated. I'll try to borrow down the issue but I just wonder if I need to be careful about CCPP host under CMEPS with threading. As I know the CCPP/physics has capability for threading. So, let me know what do you think?

uturuncoglu commented 2 years ago

@climbfuji I have also issue with restart test but this is more complicated then threading and might require splitting FV3 physics to call the CMEPS aoflux phase between them to allow calling aoflux phase in the same execution order with FV3/CCPP sfc_ocean. Anyway, let me know if you have also suggestion about it.

uturuncoglu commented 2 years ago

@jedwards4b i am getting error if I try to test cesm2_3_beta08 with updated CMEPS and CIME. Maybe it is not possible to test beta08 under this configuration.

/glade/scratch/turuncu/CESM_282/components/cmeps/cesm/nuopc_cap_share/nuopc_shr_methods.F90(135): error #6580: Name in only-list does not exist or is not accessible.   [SHR_PIO_LOG_COMP_SETTINGS]
    use shr_pio_mod, only : shr_pio_log_comp_settings
----------------------------^
compilation aborted for /glade/scratch/turuncu/CESM_282/components/cmeps/cesm/nuopc_cap_share/nuopc_shr_methods.F90 (code 1)
jedwards4b commented 2 years ago

Yes I'm sorry to have wasted your time. I will discuss with @mvertens at 4 today.

uturuncoglu commented 2 years ago

@jedwards4b No worries. In any case, i am still debugging something with threading and it could take little bit time. So, I don't have rush for this PR at this point.

uturuncoglu commented 2 years ago

BTW, we have another CESM baseline for testing at least.

uturuncoglu commented 2 years ago

@jedwards4b please do not merge this until you get confirmation from me. I am still working on fixing restart and threading ORT tests. The last commit https://github.com/ESCOMP/CMEPS/pull/282/commits/dfdb479c9b9eec693a5b050d0866ab064d1de152 seems fixed the restart issue but I just tested it manually. So, I'll update you about it.

uturuncoglu commented 2 years ago

@climbfuji @grantfirl As we discussed in the today's exchange grid call, I would like to update you about the current progress of ORT tests (restart and threading) that are failing. I fixed the restart issue and I have only issue with I/O at this point for xgrid case which we are looking with @jedwards4b. The agrid could be restarted without any issue.

But, I need your guidance about the threading issue. As you knows. tried to set thread number and block size explicitly to 1 with https://github.com/ESCOMP/CMEPS/pull/282/commits/d307cd55388cffdf050e72389e634364ba262661 but this seems not working. At this point, I need your suggestion. As you already know that cdata is 2d array in FV3 implementation but it is scalar in here. So, that would be difference at this point. Anyway, let me know if you have any suggestion.

uturuncoglu commented 2 years ago

@jedwards4b this PR is ready but needs to be coordinated with top level UFS PR. In the mean time, if you want me to do any test, just let me know. Most of the work that I did was in UFS part and I don't think it will affect CESM.

uturuncoglu commented 2 years ago

@jedwards4b i am planing to merge this if you have no any additional review or testing. Then, I will create another PR in NOAA EMC side to update CMEPS overs there.