Closed uturuncoglu closed 2 years ago
@jedwards4b this is ready to review. I could do initial CESM specific tests.
@uturuncoglu you need to merge master into your pr and push that back
@jedwards4b i am planing to run test but I am not sure which baseline needs to be compared with alpha09a. There are two under /glade/p/cesmdata/cseg/cmeps_baselines. cesm2_3_alpha07c_cmeps0.13.43 and cesm2_3_alpha07c_cmeps0.13.44. Any idea?
@uturuncoglu can you merge master into your branch and push it back first so that the github workflow passes? Try against cesm2_3_alpha07c_cmeps0.13.44. But I suspect you'll see some differences.
@jedwards4b okay. I updated the code with recent workflow fix and now it builds. Thanks for the fix.
@jedwards4b Here is the result of CESM tests (failed ones),
turuncu@cheyenne4/glade/scratch/turuncu $ ./cs.status.20220505_160855_mucy3k | grep FAIL
FAIL ERP_Vnuopc_Ln9.f09_f09_mg17.F2000climo.cheyenne_intel.cam-outfrq9s NLCOMP
FAIL ERP_Vnuopc_Ln9.f09_f09_mg17.F2000climo.cheyenne_intel.cam-outfrq9s BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERR_Vnuopc_Ld5.f09_t061.B1850MOM.cheyenne_intel.allactive-defaultio NLCOMP
FAIL ERR_Vnuopc_Ld5.f09_t061.B1850MOM.cheyenne_intel.allactive-defaultio BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
ERS_Ly3_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel (Overall: NLFAIL) details:
FAIL ERS_Ly3_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.f09_g17.I1850Clm50Sp.cheyenne_intel.clm-default NLCOMP
FAIL ERS_Vnuopc_Ld5.f09_g17.I1850Clm50Sp.cheyenne_intel.clm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.f19_g17.B1850.cheyenne_intel.allactive-defaultio NLCOMP
FAIL ERS_Vnuopc_Ld5.f19_g17.B1850.cheyenne_intel.allactive-defaultio BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.f19_g17.I2000Clm51Bgc.cheyenne_intel.clm-default NLCOMP
FAIL ERS_Vnuopc_Ld5.f19_g17.I2000Clm51Bgc.cheyenne_intel.clm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.T62_g17.C.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.T62_g17.C.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.T62_g17.G.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.T62_g17.G.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.T62_g37.DTEST.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.T62_g37.DTEST.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Lm13.f10_f10_mg37.I1850Clm50SpG.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Lm13.f10_f10_mg37.I1850Clm50SpG.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL ERS_Vnuopc_Ln5.ne16_ne16_mg17.QPC4.cheyenne_intel.cam-nuopc_cap NLCOMP
FAIL ERS_Vnuopc_Ln5.ne16_ne16_mg17.QPC4.cheyenne_intel.cam-nuopc_cap BASELINE cesm2_3_alpha07c_cmeps0.13.44: FIELDLIST field lists differ (otherwise bit-for-bit)
ERS_Vnuopc_Ln9_C3.f19_g17_rx1.A.cheyenne_intel (Overall: NLFAIL) details:
FAIL ERS_Vnuopc_Ln9_C3.f19_g17_rx1.A.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ln9.f19_g17.X.cheyenne_intel NLCOMP
FAIL ERS_Vnuopc_Ln9.f19_g17.X.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL SMS_D_Ld5_Vnuopc.f10_f10_mg37.I2000Clm50BgcCropRtm.cheyenne_intel.rtm-default NLCOMP
FAIL SMS_D_Ld5_Vnuopc.f10_f10_mg37.I2000Clm50BgcCropRtm.cheyenne_intel.rtm-default BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
SMS_D_Ly1_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel (Overall: NLFAIL) details:
FAIL SMS_D_Ly1_Vnuopc.f09_g17_gl4.T1850Gg.cheyenne_intel NLCOMP
FAIL SMS_Vnuopc.f19_g17.X.cheyenne_intel NLCOMP
FAIL SMS_Vnuopc.f19_g17.X.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
SMS_Vnuopc_Ld2.ww3a.ADWAV.cheyenne_intel (Overall: NLFAIL) details:
FAIL SMS_Vnuopc_Ld2.ww3a.ADWAV.cheyenne_intel NLCOMP
SMS_Vnuopc_Ld3.f09_f09_mg17.A1850DLND.cheyenne_intel (Overall: NLFAIL) details:
FAIL SMS_Vnuopc_Ld3.f09_f09_mg17.A1850DLND.cheyenne_intel NLCOMP
FAIL SMS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel NLCOMP
FAIL SMS_Vnuopc_Ld5.T62_t061.CMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
FAIL SMS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel NLCOMP
FAIL SMS_Vnuopc_Ld5.T62_t061.GMOM.cheyenne_intel BASELINE cesm2_3_alpha07c_cmeps0.13.44: DIFF
SMS_Vnuopc_Ln11_D.f19_g17_rx1.A.cheyenne_intel (Overall: NLFAIL) details:
FAIL SMS_Vnuopc_Ln11_D.f19_g17_rx1.A.cheyenne_intel NLCOMP
It mostly complains about namelist change which seems related with update version of CIME and CMEPS. The baseline is basically created for cesm2_3_alpha07c not cesm2_3_alpha09a. If you want I could also checkout cesm2_3_alpha07c and try again.
@jedwards4b I tried to use cesm2_3_alpha07c
for test with newer version of CMEPS and CIME but I could not because it gives error like ERROR: Makes no sense to have empty read-only file: /glade/scratch/turuncu/CESM_282_alpha07c/ccs_config/machines/config_machines.xml
. I think ccs_config is introduced later. I could copy it from cesm2_3_alpha09a
tag but I am not sure that is the way that I need to follow.
I think that the answer changes are expected and this PR is fine, @mvertens can verify.
@jedwards4b I also run scripts_regression_tests.py
and I'll update you about that. I'll also performing longer runs under UFS for extra tests. @climbfuji do you want to review the PR again?
@jedwards4b The scripts_regression_tests.py
seems fine. I could see the following at the end of the log,
----------------------------------------------------------------------
Ran 286 tests in 6879.794s
OK (skipped=87)
PASS test NLCOMP
PASS test BASELINE Detail comments
SKIP test NLCOMP Test did not make it to setup phase
SKIP test BASELINE Test did not make it to run phase
PASS test BASELINE Detail comments
PASS test NLCOMP
If you want to look at more detailed way, they are in /glade/scratch/turuncu/scripts_regression_test.20220505_234930
I also want to report an issue about the testreporter
script generated under scripts_regression_test.*
folder such as /glade/scratch/turuncu/scripts_regression_test.20220505_234930
. The script points to /glade/scratch/turuncu/CESM_282/cime/testreporter.py
but there is no such file and in fact the file is moved to /glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py
. If if fix it I am also getting following error,
Traceback (most recent call last):
File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 255, in <module>
_main_func()
File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 237, in _main_func
testxml = get_testreporter_xml(testroot, testid, tagname, testtype)
File "/glade/scratch/turuncu/CESM_282/cime/CIME/Tools/testreporter.py", line 58, in get_testreporter_xml
os.chdir(testroot)
TypeError: chdir: path should be string, bytes, os.PathLike or integer, not NoneType
So, this makes hard to look at tests and their results.
@uturuncoglu Can you create a new baseline using beta08 and then run tests against that?
@jedwards4b Sure. Is cesm2_3_beta08 uses newer version of the CMEPS that will prevent namelist changes? Are we still expecting those changes? I think it will only solve the baseline issue. Right?
@jedwards4b I created new baseline. It is in /glade/p/cesmdata/cseg/cmeps_baselines/cesm2_3_beta08_cmeps0.13.47
. Next, I'll rerun updated model against it.
@jedwards4b BTW, please wait before merging this PR. I have couple of minor fix for UFS OpnReqTests. They will not affect the CESM.
@climbfuji the OpnReqTests threading test is failing with newly added RT. I am not sure about the source of it but answer changes when threading activated. I'll try to borrow down the issue but I just wonder if I need to be careful about CCPP host under CMEPS with threading. As I know the CCPP/physics has capability for threading. So, let me know what do you think?
@climbfuji I have also issue with restart test but this is more complicated then threading and might require splitting FV3 physics to call the CMEPS aoflux phase between them to allow calling aoflux phase in the same execution order with FV3/CCPP sfc_ocean. Anyway, let me know if you have also suggestion about it.
@jedwards4b i am getting error if I try to test cesm2_3_beta08 with updated CMEPS and CIME. Maybe it is not possible to test beta08 under this configuration.
/glade/scratch/turuncu/CESM_282/components/cmeps/cesm/nuopc_cap_share/nuopc_shr_methods.F90(135): error #6580: Name in only-list does not exist or is not accessible. [SHR_PIO_LOG_COMP_SETTINGS]
use shr_pio_mod, only : shr_pio_log_comp_settings
----------------------------^
compilation aborted for /glade/scratch/turuncu/CESM_282/components/cmeps/cesm/nuopc_cap_share/nuopc_shr_methods.F90 (code 1)
Yes I'm sorry to have wasted your time. I will discuss with @mvertens at 4 today.
@jedwards4b No worries. In any case, i am still debugging something with threading and it could take little bit time. So, I don't have rush for this PR at this point.
BTW, we have another CESM baseline for testing at least.
@jedwards4b please do not merge this until you get confirmation from me. I am still working on fixing restart and threading ORT tests. The last commit https://github.com/ESCOMP/CMEPS/pull/282/commits/dfdb479c9b9eec693a5b050d0866ab064d1de152 seems fixed the restart issue but I just tested it manually. So, I'll update you about it.
@climbfuji @grantfirl As we discussed in the today's exchange grid call, I would like to update you about the current progress of ORT tests (restart and threading) that are failing. I fixed the restart issue and I have only issue with I/O at this point for xgrid case which we are looking with @jedwards4b. The agrid
could be restarted without any issue.
But, I need your guidance about the threading issue. As you knows. tried to set thread number and block size explicitly to 1 with https://github.com/ESCOMP/CMEPS/pull/282/commits/d307cd55388cffdf050e72389e634364ba262661 but this seems not working. At this point, I need your suggestion. As you already know that cdata
is 2d array in FV3 implementation but it is scalar in here. So, that would be difference at this point. Anyway, let me know if you have any suggestion.
@jedwards4b this PR is ready but needs to be coordinated with top level UFS PR. In the mean time, if you want me to do any test, just let me know. Most of the work that I did was in UFS part and I don't think it will affect CESM.
@jedwards4b i am planing to merge this if you have no any additional review or testing. Then, I will create another PR in NOAA EMC side to update CMEPS overs there.
Description of changes
This PR aims to bring exchange grid capability to UFS weather model. For this purpose, CMEPS is extended to act as a CCPP host model to calculate atmosphere-ocean fluxes by running CCPP suite files. This feature is currently only available for UFS model.
Specific notes
Contributors other than yourself, if any: None
CMEPS Issues Fixed (include github issue #):
Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial) No
Any User Interface Changes (namelist or namelist defaults changes)?
coupling_mode = nems_frac_aoflux
, It allows to pass mediator calculated atmosphere-ocean fluxes to FV3coupling_mode = nems_frac_aoflux_sbs
, This mode is just for side-by-side comparisons of fluxes calculated on FV3/CCPP and CMEPS/CCPP. In this mode, the fluxes are calculated under mediator but not sent to FV3. So, the results of fully coupled model has no answer change since the fluxes does not received by FV3 but the mediator history files will include also fluxes calculated under mediator.MED_attributes::
group.aoflux_code = 'ccpp'
, It allows to select desired atmosphere-ocean flux scheme. The available options arecesm
(default) andccpp
. Theccpp
option is only available for UFS since it requires to find FV3 sub-directory for CCPP/physics and CCPP/framework.aoflux_ccpp_suite = 'FV3_sfc_ocean'
, The name of the CCPP suite file that will be used to calculate atmosphere-ocean fluxes.nems.configure
and default values works but it might be required in the near future when external land component brings in.true
.1
.true
.2
.true
.false
) or a warmstart/restart (true
). The default value istrue
.true
.true
.true
.Testing performed
Testing performed if application target is CESM:
Testing performed if application target is UFS-coupled:
Testing performed if application target is UFS-HAFS:
Hashes used for testing: