E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
352 stars 360 forks source link

master (as of Apr 13) compile error on constance #1408

Closed kaizhangpnl closed 7 years ago

kaizhangpnl commented 7 years ago

When configuring the current master on constance (PNNL), I encountered following error:

run_acme: ++++++++ run_acme starting (Thu Apr 13 21:50:26 PDT 2017), version 3.0.2 ++++++++

run_acme: ACME.F1850C5AV1C-04P2.ne30_ne30 = ACME.F1850C5AV1C-04P2.ne30_ne30

run_acme: -------- Starting create_newcase --------

run_acme: /people/zhan524/model/ACME/cime/scripts/create_newcase --case ACME.F1850C5AV1C-04P2.ne30_ne30 --compset F1850C5AV1C-04P2 --res ne30_ne30 --project CLIMATE --pecount S --mach constance ERROR: Command: '/usr/bin/xmllint --noout --schema /people/zhan524/model/ACME_20170413/cime/cime_config/xml_schemas/entry_id.xsd /people/zhan524/model/ACME_20170413/cime/cime_config/acme/config_files.xml' failed with error ''

A core file was dumped. If I run the command above again, I got:

/usr/bin/xmllint --noout --schema /people/zhan524/model/ACME_20170413/cime/cime_config/xml_schemas/entry_id.xsd /people/zhan524/model/ACME_20170413/cime/cime_config/acme/config_files.xml Segmentation fault (core dumped)

As a result, the case directory wasn't even created. I got the same result using either the standard acme run script or my own script.

The above problem was caused by wrong environmental variable settings.

kaizhangpnl commented 7 years ago

Hi Balwinder, I see you recently fixed some problems on constance. Could you please have a look at this problem? Thanks!

singhbalwinder commented 7 years ago

@kaizhangpnl : I ran a test with the most recent master on Constance and it has gotten past your point of crash. I used the following: ./create_test SMS_Ln1.ne4_ne4.FC5AV1C-L --compiler intel --project climate -t tst1

I will try again with your ./create_newcase command.

kaizhangpnl commented 7 years ago

Hi Balwinder, thanks for the information. I just found that my environment variables were not set correctly. After correcting it, I can configure the model without problem now, but it fails to compile MCT now (please see error message below). Are you familiar with this error?

zhan524@constance01:~/model/TMP/run$ cat /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/mct.bldlog.170413-232128
cd /people/zhan524/model/ACME_20170413/cases/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p set CIMEROOT = ./xmlquery CIMEROOT -value ./xmlquery CIMEROOT -value set CASETOOLS = ./xmlquery CASETOOLS -value ./xmlquery CASETOOLS -value set GMAKE = ./xmlquery GMAKE -value ./xmlquery GMAKE -value set GMAKE_J = ./xmlquery GMAKE_J -value ./xmlquery GMAKE_J -value set MACH = ./xmlquery MACH -value ./xmlquery MACH -value set MPILIB = ./xmlquery MPILIB -value ./xmlquery MPILIB -value set OS = ./xmlquery OS -value ./xmlquery OS -value setenv MCT_DIR /people/zhan524/model/ACME_20170413/cime/externals/mct setenv MCT_LIBDIR /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct setenv LIBDIR /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads echo MCT_LIBDIR /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct MCT_LIBDIR /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct cd /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct echo Copying source to EXEROOT... Copying source to EXEROOT... if ( ( -M /people/zhan524/model/ACME_20170413/cime/externals/mct/Makefile ) > = ( -M Makefile ) ) then cp /people/zhan524/model/ACME_20170413/cime/externals/mct/Makefile . endif if ( ! -d mct ) then mkdir mct endif if ( ( -M /people/zhan524/model/ACME_20170413/cime/externals/mct/mct/Makefile ) > = ( -M mct/Makefile ) ) then cp /people/zhan524/model/ACME_20170413/cime/externals/mct/mct/Makefile mct endif if ( ! -d mpeu ) then mkdir mpeu endif if ( ( -M /people/zhan524/model/ACME_20170413/cime/externals/mct/mpeu/Makefile ) > = ( -M mpeu/Makefile ) ) then cp /people/zhan524/model/ACME_20170413/cime/externals/mct/mpeu/Makefile mpeu endif set runconf = 0 set runclean = 0 echo Running configure... Running configure... echo for OS=LINUX MACH=constance for OS=LINUX MACH=constance gmake -f /people/zhan524/model/ACME_20170413/cases/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/Tools/Makefile /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct/Makefile.conf MODEL=mct cat: Filepath: No such file or directory cat: Srcfiles: No such file or directory /people/zhan524/model/ACME_20170413/cases/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/Tools/mkSrcfiles cp -f /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct/Filepath /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct/Deppath /people/zhan524/model/ACME_20170413/cases/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/Tools/mkDepends Deppath Srcfiles > /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct/Depends gmake: No rule to make target /pic/scratch/zhan524/bld/TEST_constance_F1850C5AV1C-04P2_ne30_ne30_ACME_20170413_256p/bld/intel/mvapich2/nodebug/nothreads/mct/Makefile.conf'. Stop. if ( 0 == 1 ) then cp -p Makefile.conf Makefile.conf.old cp: cannot statMakefile.conf': No such file or directory gmake SRCDIR=/people/zhan524/model/ACME_20170413/cime/externals/mct Makefile:4: Makefile.conf: No such file or directory gmake: No rule to make target `Makefile.conf'. Stop. exit 1

kaizhangpnl commented 7 years ago

Hi Balwinder, the complete compile log is available here: /people/zhan524/model/ACME_20170413/compile_log Thanks!

singhbalwinder commented 7 years ago

I think there is still some issue with your environment settings. The model (hash: 06586e60efd) builds just fine at my end using the following commands:

$cd ACME/cime/scripts
$./create_newcase --case ACME.F1850C5AV1C-04P2.ne30_ne30 --compset F1850C5AV1C-04P2 --res ne30_ne30 --project CLIMATE --pecount S --mach constance
$cd ACME.F1850C5AV1C-04P2.ne30_ne30 
$./case.setup && ./case.build && ./case.submit 

Look into my .cshrc file to see if there is something you are missing. We can also meet today sometime to discuss this.

kaizhangpnl commented 7 years ago

Thanks Balwinder. Constance is very slow this morning, so I have to try it later. I am going to close this issue now since it worked for you.

singhbalwinder commented 7 years ago

Hi Kai, I hit this issue on Cascade and by exploring and printing out a lot of variables, I found out that the environment was generating COMPILER=INTEL (uppercase) and the path to mct libraries had 'intel' in the lowercase. There were two solutions:

  1. unset COMPILER: This will force the Makefile to extract COMPILER from the XML files, which has the right case (lowercase) for the COMPILER variable
  2. Reset COMPILER to be lower case

I tried the 2nd option and it worked fine for me. May be you also have COMPILER defined somewhere in your environment.

This was an annoying problem....it took me some time to figure this out. Hope this helps!

kaizhangpnl commented 7 years ago

Hi Balwinder, this is helpful. Thanks!