CESM-Development / cime

Common Infrastructure for Modeling the Earth
Other
16 stars 13 forks source link

CME tests fail intermittently #115

Closed billsacks closed 9 years ago

billsacks commented 9 years ago

The recent clean_build changes still break the build of some CME tests. I haven't been able to figure out exactly when this is broken and when it isn't, but it seems to just be an issue when creating a test suite (as opposed to a single test).

Steps to reproduce:

(1) Check out clm4_5_1_r117

(2) Switch cime to point to cime1.1.20

(3) create_test -xml_category aux_clm40 -xml_mach yellowstone -xml_compiler intel -testroot debug_cme_01

(I did this on caldera using: bsub -q caldera -n8 -PP93300601 -W24:00 -o test.%J.out -e test.%J.err but I'm guessing that isn't relevant for the reproduction of this bug.)

(4) Notice that this test:

CME_Ld5.f10_f10.ICN.yellowstone_intel

CFAILs with:

/glade/p/work/sacks/cesm_code/clm_move_glint_into_cpl3/cime/driver_cpl/driver/cesm_comp_mod.F90(141): error #6580: Name in only-list does not exist. [COMPONENT_INIT_UPDATE_PETLIST] use component_mod , only: component_init_update_petlist

and other errors.

(5) Get a git clone of cime, where you have checked out cime1.1.20.

(6) Use:

git revert -m 1 53ad4f06b8237cc3894250c57908b195a160619f git revert -m 1 f8a06a62425ae2758139249051c1f1dba1592cf8

to back out the clean build changes

(7) Rerun the test suite as above

Notice that the CME test now passes.

billsacks commented 9 years ago

cc'ing @sholly @mvertens @bandre-ucar @ekluzek on this

billsacks commented 9 years ago

@jedwards4b pointed out that this problem may be dependent on the test order, similar to https://github.com/CESM-Development/cime/issues/116. And indeed, I see that the passing vs. failing tests were in test suites where the tests were run in different orders, with different tests being built both before and after the CME test in question.

jedwards4b commented 9 years ago

I believe that this is resolved in cime2.0.4

bandre-ucar commented 9 years ago

I'm getting an intermittent ESMF compilation problem in clm4_5_1_r120, cime-2.0.7. CME_Ly4.f10_f10.I1850CLM45BGC.yellowstone_intel.clm-monthly CFAILs when run as part of the test suite, but is fine when built and run standalone. It appears to be the same race condition described above....? Here's some of the compiler errors from a test suite run:

mpif90  -c -I.  -I/glade/scratch/andre/sharedlibroot.20150828-16-45i/intel/mpich
2/nodebug/nothreads/include -I/glade/scratch/andre/sharedlibroot.20150828-16-45i
/intel/mpich2/nodebug/nothreads/ESMF/esmf/a1l1r1i1o1g1w1/csm_share -I/glade/apps
/opt/netcdf-mpi/4.3.3.1/intel/default/include -I/glade/apps/opt/pnetcdf/1.6.0/in
tel/default/include -I/glade/scratch/andre/CME_Ly4.f10_f10.I1850CLM45BGC.yellows
tone_intel.clm-monthly.GC.20150828-16-45i/bld/atm/obj -I/glade/scratch/andre/CME
_Ly4.f10_f10.I1850CLM45BGC.yellowstone_intel.clm-monthly.GC.20150828-16-45i/bld/
ice/obj -I/glade/scratch/andre/CME_Ly4.f10_f10.I1850CLM45BGC.yellowstone_intel.c
lm-monthly.GC.20150828-16-45i/bld/ocn/obj -I/glade/scratch/andre/CME_Ly4.f10_f10
.I1850CLM45BGC.yellowstone_intel.clm-monthly.GC.20150828-16-45i/bld/glc/obj -I/g
lade/scratch/andre/CME_Ly4.f10_f10.I1850CLM45BGC.yellowstone_intel.clm-monthly.G
C.20150828-16-45i/bld/rof/obj -I/glade/scratch/andre/CME_Ly4.f10_f10.I1850CLM45B
GC.yellowstone_intel.clm-monthly.GC.20150828-16-45i/bld/wav/obj -I/glade/scratch
/andre/sharedlibroot.20150828-16-45i/intel/mpich2/nodebug/nothreads/include -I/g
lade/p/work/andre/nitrogen/clm-trunk/cime/share/csm_share/shr -I/glade/p/work/an
dre/nitrogen/clm-trunk/cime/share/csm_share/include -I/glade/scratch/andre/share
dlibroot.20150828-16-45i/intel/mpich2/nodebug/nothreads/ESMF/esmf/clm/obj -I. -I
/glade/scratch/andre/tests-clm-20150828-16/CME_Ly4.f10_f10.I1850CLM45BGC.yellows
tone_intel.clm-monthly.GC.20150828-16-45i/SourceMods/src.drv -I/glade/p/work/and
re/nitrogen/clm-trunk/cime/driver_cpl/driver -I/glade/scratch/andre/CME_Ly4.f10_
f10.I1850CLM45BGC.yellowstone_intel.clm-monthly.GC.20150828-16-45i/bld/lib/inclu
de  -no-opt-dynamic-align  -convert big_endian -assume byterecl -ftz -traceback -assume realloc_lhs -fp-model source    -xHost  -O2 -debug minimal   -DLINUX  -D
NDEBUG -DUSE_ESMF_LIB -DESMF_INTERFACE -DHAVE_MPI -DFORTRANUNDERSCORE -DNO_R16 -
DLINUX -DCPRINTEL  -DHAVE_SLASHPROC -I/glade/apps/opt/esmf/6.3.0rp1-defio/intel/
15.0.1/mod/modO/Linux.intel.64.mpich2.default -I/glade/apps/opt/esmf/6.3.0rp1-defio/intel/15.0.1/include  -free  -DUSE_CONTIGUOUS=contiguous, /glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(346): error #6460: This is not a field name that is defined in the encompassing structure.   [GRIDCOMP_CC]
          comp(eci)%gridcomp_cc = ESMF_GridCompCreate(name=trim(comp(eci)%name), petList=petlist, rc=rc)
--------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(346): error #6303: The assignment operation or the binary expression operation is invalid for the data types of the two operands.
          comp(eci)%gridcomp_cc = ESMF_GridCompCreate(name=trim(comp(eci)%name), petList=petlist, rc=rc)
----------------------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(349): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_GRIDCOMPSETSERVICES]
          call ESMF_GridCompSetServices(comp(eci)%gridcomp_cc, userRoutine=gridc
omp_register, rc=rc)
---------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(354): error #6460: This is not a field name that is defined in the encompassing structure.   [X2C_CC_STATE]
          comp(eci)%x2c_cc_state = ESMF_StateCreate(name=trim(comp(eci)%ntype)//" x2c_cc", &
--------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(354): error #6303: The assignment operation or the binary expression operation is invalid for the data types of the two operands.
          comp(eci)%x2c_cc_state = ESMF_StateCreate(name=trim(comp(eci)%ntype)//" x2c_cc", &
-----------------------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(358): error #6460: This is not a field name that is defined in the encompassing structure.   [C2X_CC_STATE]
          comp(eci)%c2x_cc_state = ESMF_StateCreate(name=trim(comp(eci)%ntype)//" c2x_cc", &
--------------------^
358): error #6303: The assignment operation or the binary expression operation is invalid for the data types of the two operands.
          comp(eci)%c2x_cc_state = ESMF_StateCreate(name=trim(comp(eci)%ntype)//" c2x_cc", &
-----------------------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(364): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_ATTRIBUTELINK]
          call ESMF_AttributeLink(drvcomp, comp(eci)%gridcomp_cc, rc=rc)
---------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(398): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_ATTRIBUTESET]
                call ESMF_AttributeSet(comp(eci)%c2x_cc_state, name="ID", &
---------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(404): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_ATTRIBUTESET]
             call ESMF_AttributeSet(comp(eci)%c2x_cc_state, name=trim(comp(eci)%ntype)//"_phase", &
------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(418): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_STATEGET]
                call ESMF_StateGet(comp(eci)%x2c_cc_state, itemName="x2d", array=x2c_cc_array, rc=rc)
---------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(436): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_STATEGET]
                call ESMF_StateGet(comp(eci)%c2x_cc_state, itemName="d2x", array=c2x_cc_array, rc=rc)
---------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(466): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_STATEGET]
                call ESMF_StateGet(comp(eci)%x2c_cc_state, itemName="x2d", array=x2c_cc_array, rc=rc)
---------------------^
/glade/p/work/andre/nitrogen/clm-trunk/cime/driver_cpl/driver/component_mod.F90(469): error #6285: There is no matching specific subroutine for this generic subroutine call.   [ESMF_STATEGET]
                call ESMF_StateGet(comp(eci)%c2x_cc_state, itemName="d2x", array=c2x_cc_array, rc=rc)
ekluzek commented 9 years ago

I had the same issue with clm4_5_1_r119 which was using cime2.0.7

billsacks commented 9 years ago

Reopening, since this appears to still be a problem.

jedwards4b commented 9 years ago

There is a problem in the shared build when you are running a test suite, but why do you link this to the clean_build script?

billsacks commented 9 years ago

@jedwards4b is correct that the title of this bug report is likely wrong, so I'm renaming it from "Recent clean_build changes still break some CME tests" to "CME tests fail intermittently". I think it's worth keeping the bug open, though, until the problem is resolved.

jedwards4b commented 9 years ago

I think that this is fixed by #148, we will need a full round of beta testing to see.

billsacks commented 9 years ago

This appears to truly be fixed; closing it.