E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
350 stars 360 forks source link

slow compilation for MPAS-O and MPAS-CICE on Cetus and Mira #579

Closed worleyph closed 7 years ago

worleyph commented 8 years ago

Since this occurs on both Cetus and Mira (today, at least) it may be more than a transient issue ... MPAS-CICE takes 1 hour to compile. Looking at the timestamps of the object files ...

 ....
 -rw-r--r-- 1 worley users 2521424 Dec 22 18:04 cice_core_interface.mod
 -rw-r--r-- 1 worley users  658562 Dec 22 18:56 mpas_cice_cpl_indices.mod
 -rw-r--r-- 1 worley users  644886 Dec 22 18:56 mpas_cice_mct_vars.mod
 -rw-r--r-- 1 worley users  252454 Dec 22 18:56 ice_comp_mct.mod

most of the time is spent in compiling mpas_cice_cpl_indices.mod . Not sure who is best positioned to look at this, but will assign it to Doug to start with.

worleyph commented 8 years ago

Further information ... MPAS-CICE only takes 12 minutes to build if part of an MPI-only ACME build. However it takes 1 hour if other components are built with threading enabled (whether MPAS-CICEI is or not). I have observed 1 hour compilations for

(a)

 <entry id="NTHRDS_ICE"   value="8"  />

(which I would have assumed would do nothing for the current version of MPAS-CICE), or

(b)

 <entry id="NTHRDS_ICE"   value="1"  />

and

 <entry id="BUILD_THREADED"   value="TRUE"  />

or

(c)

   <entry id="NTHRDS_ICE"   value="1"  />

and

 <entry id="BUILD_THREADED"   value="FALSE"  />

In the last case (in which other components are build with OpenMP, just not MPAS-CICE), the compiler commands include '-qsmp=omp' for MPAS-CICE routines, so the BUILD_THREADED setting is being ignored? It isn't always - not sure what determines this. In any case, '-qsmp=omp' appears to cause mpas_cice_cpl_indices.F90 to build very slowly. I'll keep poking, as time permits.

rljacob commented 8 years ago

@worleyph is this the only issue with MPAS-CICE on cetus that you know of?

worleyph commented 8 years ago

No. I also see a runtime failure (see issue #584). The other issue occurs with or without threading.

amametjanov commented 8 years ago

The problem with BUILD_THREADED was fixed in #617.

Compiling components/mpas-cice/model/src/core_cice/model_forward/mpas_cice_core_interface.F with default flags -qsmp=omp -O3 -qstrict -Q takes 48 minutes:

"mpas_cice_core_interface.f90", 1520-031 (W) Option DLINES is ignored within Fortran 90 free form and IBM free form.
FORTRAN   - Phase Ends;  14.490/ 15.030
** cice_core_interface   === End of Compilation 1 ===
HOT       - Phase Ends;  14.080/ 14.370
W-TRANS   - Phase Ends;  30.740/ 31.330
OPTIMIZ   - Phase Ends; 160.340/160.630
REGALLO   - Phase Ends;  46.520/ 46.560
AS        - Phase Ends;   0.590/  0.590
IPA       - Phase Ends; 2602.360/2620.220
1501-510  Compilation successful for file mpas_cice_core_interface.f90.

Trying out other flags to reduce this files's compilation time to less than 1 minute.

amametjanov commented 8 years ago

Update: lowering optimization flags does not help, at -qsmp=noauto:noopt:noomp -O0 -qstrict -Q:

FORTRAN   - Phase Ends;  15.880/ 16.450
** cice_core_interface   === End of Compilation 1 === 
HOT       - Phase Ends;  14.650/ 14.970
W-TRANS   - Phase Ends;  29.290/ 29.740
OPTIMIZ   - Phase Ends; 106.200/106.420
REGALLO   - Phase Ends;  44.660/ 44.720
AS        - Phase Ends;   0.550/  0.550
IPA       - Phase Ends; 2733.740/2740.670
1501-510  Compilation successful for file mpas_cice_core_interface.f90.
worleyph commented 8 years ago

Since BUILD_THREADED now works, and since there is no threading to enable in MPAS-CICE, we can probably close this for now, reopening if/when MPAS-CICE starts to support threading.

rljacob commented 8 years ago

Over in #901, @amametjanov said "Just to note: compilation with threaded ice took 4hrs 26mins and threaded ocn took 4hrs 11mins for a total of ~9 hours. Note that lowering optimization flags does not help. The IPA (inter-procedural analysis) takes ~45 minutes for MPAS interface files with OpenMP flag."

How does that compare to CAM compilation time?

rljacob commented 8 years ago

And how long does threaded ocean and threaded sea-ice take on Edison or Titan?

worleyph commented 8 years ago

Changed the title to indicate slow compilation for both MPAS-O and MPAS-CICE when threading is enabled.

worleyph commented 8 years ago

Rebuilding with threading on Titan now - will report back when complete.

amametjanov commented 8 years ago

Updated the comment there: CAM compilation took 12 minutes. Building the same case on Edison now.

amametjanov commented 8 years ago

Compilation time on Edison was 5 mins for atm, 11 mins for ice, 13 mins for ocn

 .... determining environment variables from env_mach_specific
COMPILER=intel
MPILIB=mpt
DEBUG=FALSE
 .... building model executable (calling ./Buildconf/cesm_build.pl)
    .... checking namelists (calling ./preview_namelists)
    .... calling data prestaging
    .... calling cesm build checks
    .... calling cesm builds for utility libraries (compiler is intel)
      build libraries: mct gptl pio csm_share
      Wed Jun 15 09:23:45 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/mct.bldlog.160615-092319
      Wed Jun 15 09:26:56 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/gptl.bldlog.160615-092319
      Wed Jun 15 09:27:02 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/pio.bldlog.160615-092319
      Wed Jun 15 09:30:07 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/csm_share.bldlog.160615-092319
    .... calling cesm builds for component libraries
      Wed Jun 15 09:30:52 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/atm.bldlog.160615-092319
         - Building clm4_5/clm5_0 shared library
       bldroot is /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/MCT/noesmf/
       objdir  is /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/MCT/noesmf//clm/obj
       libdir  is /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/intel/mpt/nodebug/threads/MCT/noesmf//lib
      Wed Jun 15 09:35:11 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/lnd.bldlog.160615-092319
      Wed Jun 15 09:38:57 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/ice.bldlog.160615-092319
      Wed Jun 15 09:49:24 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/ocn.bldlog.160615-092319
connect localhost port 6011: Connection refused
connect localhost port 6011: Connection refused
      Wed Jun 15 10:02:29 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/glc.bldlog.160615-092319
      Wed Jun 15 10:02:31 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/wav.bldlog.160615-092319
      Wed Jun 15 10:02:32 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/rof.bldlog.160615-092319
      Wed Jun 15 10:03:04 2016 /scratch1/scratchdirs/azamat/acme_scratch/AWCYCL2000-ne30oEC-d1/bld/cesm.bldlog.160615-092319
 .... locking file ./env_build.xml
 .... successfully built model executable

real    40m58.695s
user    57m46.721s
sys     9m42.944s
worleyph commented 8 years ago

And on Titan: (-compset A_WCYCL2000 -res ne30_oEC) atm: 12m 15s ocn: 15m 25s ice: 13m 58s

 > date
 Wed Jun 15 13:01:43 EDT 2016
 > ./A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test.build
 .... determining environment variables from env_mach_specific 
 COMPILER=pgi
 MPILIB=mpich
  .... building model executable (calling ./Buildconf/cesm_build.pl) 
     .... checking namelists (calling ./preview_namelists) 
     .... calling data prestaging  
     .... calling cesm build checks 
     .... calling cesm builds for utility libraries (compiler is pgi) 
       build libraries: mct gptl pio csm_share
       Wed Jun 15 13:02:17 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/mct.bldlog.160615-130153
       Wed Jun 15 13:03:42 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/gptl.bldlog.160615-130153
       Wed Jun 15 13:03:46 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/pio.bldlog.160615-130153
       Wed Jun 15 13:05:17 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/csm_share.bldlog.160615-130153
     .... calling cesm builds for component libraries  
       Wed Jun 15 13:05:59 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/atm.bldlog.160615-130153
          - Building clm4_5/clm5_0 shared library 
        bldroot is /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/MCT/noesmf/ 
        objdir  is /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/MCT/noesmf//clm/obj 
        libdir  is /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/pgi/mpich/nodebug/threads/MCT/noesmf//lib 
       Wed Jun 15 13:18:14 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/lnd.bldlog.160615-130153
       Wed Jun 15 13:25:49 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/ice.bldlog.160615-130153
       Wed Jun 15 13:39:47 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/ocn.bldlog.160615-130153
       Wed Jun 15 13:55:12 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/glc.bldlog.160615-130153
       Wed Jun 15 13:55:14 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/wav.bldlog.160615-130153
       Wed Jun 15 13:55:16 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/rof.bldlog.160615-130153
       Wed Jun 15 13:55:53 2016 /ccs/home/worley/acme_scratch/cli112/A_WCYCL2000.ne30_oEC_titan_pgi_ocean_openmp_test/bld/cesm.bldlog.160615-130153
  .... locking file ./env_build.xml
  .... successfully built model executable
amametjanov commented 7 years ago

Fixed by #1257