Closed DeniseWorthen closed 2 years ago
@DeniseWorthen I need to update my exchange grid fork based on these changes and these it again. Are you plaining to update CMEPS in UFS after merging this PR?
@uturuncoglu I ran the UFS HAFS regression tests and they all passed. Also, I will make a PR back to UWM with these changes. That will also bring back any other update that has been made since I last updated the EMC fork (Feb 3).
@DeniseWorthen that is great. Do I need to create PR to authoritative repo or NOA-EMC fork? I could not be sure. Of course, it would be draft, at this point.
@uturuncoglu I'm not 100% sure where in CMEPS your xgrid work touches. Do you need changes in FldsExchange_nems? I'm happy to work w/ you at getting those put in now (even if not functional) if that saves effort.
Yes, there are some mods in FldsExchange_nems
since I introduced two new coupling model. I have also some change in flux computation part under ufs/ directory.
@uturuncoglu @denise - this must be tested with cesm as well. We need to fire off the prealpha tests using cesm2_3_beta08 as a baseline. @uturuncoglu - are you willing to do this? If not I can take this on.
@mvertens sure. I could run it and let you know.
Thank you! You will need to merge in the latest cmeps master into this PR to have this working - but that should be part of the testing. Does that make sense?
@mvertens i checkout CMEPS master and merge with @DeniseWorthen branch. So, it would be fine at this point.
@uturuncoglu - that sounds great. Thank you.
@mvertens it will take longer than I thought. I have an issue with my disk quota since I am keeping all 35-days long runs for exchange grid work. I'll try to solve them first and start tests again.
@uturuncoglu - no problem. Thank you so much for doing this!!! Let me know if you want me to take this on if it gets too complicated on your end.
@uturuncoglu There is nothing time-critical in this PR on my side, it is just work I started on as part of the Wave coupling and I thought I'd take the time to get it committed. If it is easier to proceed on the X-grid changes w/o the changes in this PR, that is fine too. We can always circle back to it.
@DeniseWorthen Thanks. No, that is fine for me. I think this must go first and then I could make required modifications for the exchange grid. I'll update all once I run CESM pre-alpha tests.
@mvertens @DeniseWorthen I run all tests and here is the list of failed test and their error logs (I run them separately with create_test again after running full test suite and cleaning my scratch since some of them was failing with disk quota and to be sure). At this point, I don't think the errors caused by the changes in this PR. So, it seems it is safe to marge this PR but let me know what do you think?
DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv
2022-04-05 23:35:23: Test 'DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv' failed in phase 'CREATE_NEWCASE' with exception 'ERROR: _N option not supported by nuopc driver, use _C instead'
File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/test_scheduler.py", line 1080, in _run_catch_exceptions
return run(test)
File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/test_scheduler.py", line 669, in _create_newcase_phase
expect(False, "_N option not supported by nuopc driver, use _C instead")
File "/glade/scratch/turuncu/CESM_pr_279/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 163, in expect
raise exc_type(msg)
---------------------------------------------------
ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s
Building test for ERP in directory /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220405_233500_pe3gnl
/glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/fv3/atmos_cubed_sphere/tools/fv_mp_mod.F90(75): error #6580: Name in only-list does not exist or is not accessible. [MPP_NODE]
ERROR: BUILD FAIL: cam.buildlib failed, cat /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220405_233500_pe3gnl/bld/atm.bldlog.220406-004859
The details build log is in /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s.20220406_153639_s2y85c/bld/atm.bldlog.220406-153959
.
ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail
81:MPT ERROR: Rank 81(g:81) received signal SIGFPE(8).
81: Process ID: 58605, Host: r6i6n12, Program: /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail.20220406_154930_sfdkz3/bld/cesm.exe
81: MPT Version: HPE MPT 2.22 03/31/20 15:59:10
81:
81:MPT: --------stack traceback-------
81:OMP: Warning #190: Forking a process while a parallel region is active is potentially unsafe.
46:MPT ERROR: Rank 46(g:46) received signal SIGFPE(8).
46: Process ID: 21616, Host: r13i2n20, Program: /glade/scratch/turuncu/ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail.20220406_154930_sfdkz3/bld/cesm.exe
46: MPT Version: HPE MPT 2.22 03/31/20 15:59:10
I also run this by activating ESMF PET log but there is no any error in there. So, this requires further investigation.
SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem
1908:MPT: #1 0x00002b37033d5306 in mpi_sgi_system (
1908:MPT: #2 MPI_SGI_stacktraceback (
1908:MPT: header=header@entry=0x7fff74fe8c50 "MPT ERROR: Rank 1908(g:1908) received signal SIGFPE(8).\n\tProcess ID: 7399, Host: r7i4n30, Program: /glade/scratch/turuncu/SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.ca"...) at sig.c:340
1908:MPT: #3 0x00002b37033d54ff in first_arriver_handler (signo=signo@entry=8,
1908:MPT: stack_trace_sem=stack_trace_sem@entry=0x2b3712d00080) at sig.c:489
1899:MPT: #4 0x00002ac3b632d793 in slave_sig_handler (signo=8, siginfo=<optimized out>,
1899:MPT: extra=<optimized out>) at sig.c:565
1899:MPT: #5 <signal handler called>
1899:MPT: #6 0x00000000011d1e78 in physconst::get_hydrostatic_energy (i0=1, i1=16,
1899:MPT: j0=1, j1=1, nlev=32, ntrac=200,
1899:MPT: tracer=<error reading variable: value requires 819200 bytes, which is more than max-value-size>, pdel=..., cp_or_cv=..., u=..., v=..., t=..., vcoord=0,
1899:MPT: ps=..., phis=..., z=...,
1899:MPT: dycore_idx=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT: te=..., se=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT: ke=<error reading variable: Cannot access memory at address 0x0>,
1899:MPT: wv=<error reading variable: Cannot access memory at address 0x0>, h2o=...,
1899:MPT: liq=<error reading variable: Cannot access memory at address 0x0>, ice=...)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/utils/physconst.F90:1244
1899:MPT: #7 0x0000000002c385d4 in check_energy::check_energy_timestep_init (state=...,
1899:MPT: tend=..., pbuf=0x2ae3b7a49f80,
1899:MPT: col_type=<error reading variable: Cannot access memory at address 0x0>)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/physics/cam/check_energy.F90:254
1899:MPT: #8 0x0000000003242f04 in dp_coupling::derived_phys_dry (phys_state=...,
1899:MPT: phys_tend=..., pbuf2d=0x2ae3b7a49f80)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/dp_coupling.F90:700
1899:MPT: #9 0x00000000031f77b2 in dp_coupling::d_p_coupling (phys_state=...,
1899:MPT: phys_tend=..., pbuf2d=0x2ae3b7a49f80, dyn_out=...)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/dp_coupling.F90:289
1899:MPT: #10 0x0000000002483a52 in stepon::stepon_run1 (dtime_out=225, phys_state=...,
1899:MPT: phys_tend=..., pbuf2d=0x2ae3b7a49f80, dyn_in=..., dyn_out=...)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/dynamics/se/stepon.F90:110
1899:MPT: #11 0x0000000000a209eb in cam_comp::cam_run1 (
1899:MPT: cam_in=<error reading variable: value requires 147400 bytes, which is more than max-value-size>,
1899:MPT: cam_out=<error reading variable: value requires 151800 bytes, which is more than max-value-size>)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/control/cam_comp.F90:243
1899:MPT: #12 0x00000000009d38fc in atm_comp_nuopc::datainitialize (gcomp=..., rc=0)
1899:MPT: at /glade/scratch/turuncu/CESM_pr_279/components/cam/src/cpl/nuopc/atm_comp_nuopc.F90:873
1899:MPT: #13 0x00002ac3b00a9432 in ESMCI::MethodElement::execute(void*, int*) const ()
1899:MPT: at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/Component/src/ESMCI_MethodTable.C:377
1899:MPT: #14 0x00002ac3b00aa896 in ESMCI::MethodTable::execute (this=0x17541d20,
1899:MPT: labelArg=..., object=0x1753f020, userRc=0x7ffee57be498,
1899:MPT: existflag=0x7ffee57be222)
1899:MPT: at /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Superstructure/Component/src/ESMCI_MethodTable.C:563
The full log can be seen in /glade/scratch/turuncu/SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem.20220406_162722_xutg84/run/cesm.log.3676318.chadmin1.ib0.cheyenne.ucar.edu.220406-192524
@mvertens Let me know if you want me to do more test? How do you want to proceed with this PR?
@fischer-ncar @jedwards4b - are these expected fails for beta08? I think its fine to proceed with accepting and merging these PRs - but wanted to verify this first.
For cesm2_3_beta08 These two tests passed. DAE_N2_D_Lh12_Vnuopc.f10_f10_mg37.I2000Clm50BgcCrop.cheyenne_intel.clm-DA_multidrv SMS_D_Ln9_Vnuopc.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.cheyenne_intel.cam-outfrq9s_refined_camchem
These two tests failed. ERP_D_Ln9_Vnuopc.C48_C48_mg17.QPC6.cheyenne_intel.cam-outfrq9s ERP_D_Ln9_Vnuopc.f09_f09_mg17.FSD.cheyenne_intel.cam-outfrq9s_contrail
@mvertens Please don't merge. Adding comp_present conditionals appears to resolve the issue w/ the ATM-WAV configuration, so I may want to make further changes to this PR branch.
@mvertens This is ready for any final testing on your end. The additional checks for the presence of components allows me to run ATM-WAV only coupling for UWM. I ran all tests for UWM and all baselines passed.
@uturuncoglu - are you comfortable with my merging this PR?
@mvertens It looks fine to me since those errors was not related with the PR but need to be investigated in the near future (not expected ones).
@uturuncoglu - thank you. Actually - those failures are not errors - but newly asked output from the mediator to the wav. I ran these differences by @alperaltuntas today - and we are both comfortable with these new export answers.
Description of changes
Refactors
esmFldsExchange_nems.F90
to use separate advertise and initialize phases and to check that a component is present before advertising a field to or from that component. Implements default src and dst mask values in place of the code currently inmed_map_mod.F90
Specific notes
Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial)
No
Any User Interface Changes (namelist or namelist defaults changes)?
No
Testing performed
Testing performed if application target is CESM:
Testing performed if application target is UFS-coupled:
Testing performed if application target is UFS-HAFS:
Hashes used for testing: