E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
346 stars 353 forks source link

TransportMod.F90 fails to compile on titan with pgi and DEBUG on #738

Closed susburrows closed 8 years ago

susburrows commented 8 years ago

There appears to be a bug in TransportMod.F90 that is preventing compilation of some model configurations on titan with the pgi compiler and DEBUG on.

I encountered this issue while attempting to build FC5AV1F on an ne30_ne30 grid, with a custom pe-layout with NTASKS=2700 and NTHRDS=2 for all model components.

script to reproduce this issue on titan: /ccs/home/sburrows/ACME-project/runscripts/ACME/master_test/run_acme_FC5V1F_ntasks_2700_nthrds_2.csh

Logfile from runscript is at: /ccs/home/sburrows/ACME-project/runscripts/ACME/master_test/run_acme_FC5V1F_ntasks_2700_nthrds_2.csh.log

Logfiles from build are at: /lustre/atlas/proj-shared/cli112/sburrows/ACME_simulations/master_test_clm_compile_bug.FC5AV1F.ne30_ne30.titan.alpha-fam-ntasks-2700/build/

Here is a portion of the error messages in the log file:

PGF90-S-0285-Source line too long (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src /betr/betr_core/TransportMod.F90: 379) PGF90-S-0285-Source line too long (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src /betr/betr_core/TransportMod.F90: 380) PGF90-S-0026-Unmatched quote (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src/betr /betr_core/TransportMod.F90: 379) PGF90-S-0285-Source line too long (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src /betr/betr_core/TransportMod.F90: 381) PGF90-S-0026-Unmatched quote (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src/betr /betr_core/TransportMod.F90: 380) PGF90-S-0026-Unmatched quote (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src/betr /betr_core/TransportMod.F90: 381)

PGF90-W-0006-Input file empty (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src/bio geophys/vsfm/ConditionType.F90) PGF90/x86-64 Linux 15.3-0: compilation completed with warnings PGF90-W-0006-Input file empty (/autofs/nccs-svm1_home1/sburrows/ACME-project/ACME-codes/alpha_fam_with_mom_bugfix/components/clm/src/bio geophys/vsfm/RichardsODEPressureAuxType.F90) PGF90/x86-64 Linux 15.3-0: compilation completed with warnings Timing stats: Total time 0 millisecs Timing stats: Total time 0 millisecs gmake: * [TransportMod.o] Error 2 gmake: * Waiting for unfinished jobs....

... ERROR: clm.buildlib gmake complib -j 8 MODEL=clm COMPLIB=/lustre/atlas/proj-shared/cli112/sburrows/ACME_simulations/alpha_fam_with_mom_b ugfix.FC5AV1F.ne30_ne30.titan.tuning_n2700x2_te/build/pgi/mpich/debug/threads/MCT/noesmf//lib/libclm.a USER_CPPDEFS=" -DMODAL_AER " -f / lustre/atlas/proj-shared/cli112/sburrows/ACME_simulations/alpha_fam_with_mom_bugfix.FC5AV1F.ne30_ne30.titan.tuning_n2700x2_te/case_scrip ts/Tools/Makefile failed: 512

susburrows commented 8 years ago

@bishtgautam , I am assigning this to you, but please feel free to reassign.

bishtgautam commented 8 years ago

Hi @susburrows : Can you test out the bishtguatam/lnd/fix-betr-build-failures branch?

susburrows commented 8 years ago

Thanks, I will try to test it later today.

worleyph commented 8 years ago

This is the common problem with using

FILE,LINE

in that FILE is replaced by the full path name? These lines are long to start with though. I believe that @jayeshkrishna came up with a workaround in PIO, but there are issues like this in HOMME as well, that pop only when your file is nested deeply in a number of subdirectories.

bishtgautam commented 8 years ago

@worleyph was referring to issue-https://github.com/ACME-Climate/ACME/issues/455 that was fixed by PR-https://github.com/ACME-Climate/ACME/pull/460.

The SE-team should come up with a standard solution for this. One such solution could be to define a variable that has a relative path w.r.t. to ACME directory in all F90 files that use __FILE__.

So, components/clm/src/betr/betr_core/TransportMod.F90 would at the top have

#define __REL_FILE__ "components/clm/src/betr/betr_core/TransportMod.F90"

which would be used within the code:

...
...
SHR_ASSERT_ALL((ubound(jtops)  == (/bounds%endc/)),   errMsg(__REL_FILE__,__LINE__))
susburrows commented 8 years ago

Thanks for quickly working on / commenting on this, @bishtgautam and @worleyph . I tried again with your branch. For some reason, now I am running into an issue in the atmosphere model build step and it is not getting to land at all.

susburrows commented 8 years ago

@bishtgautam , did you try compiling this on titan with pgi?

susburrows commented 8 years ago

I checked out just the CLM modifications into the code version I was working on before and was able to build successfully. Looks like it works.

susburrows commented 8 years ago

@bishtgautam , I encountered this error again today when inadvertently building with pgi on titan. After merging your branch (bishtguatam/lnd/fix-betr-build-failures) into my code, I can compile. So it looks like your fix works, but it never got merged to master?

bishtgautam commented 8 years ago

@susburrows: Thanks for reminding me. The branch is merged is now merged in next.

susburrows commented 8 years ago

Thanks, Gautam!