ExeClim / Isca

Idealized GCM from the University of Exeter
https://execlim.github.io/IscaWebsite
GNU General Public License v3.0
139 stars 125 forks source link

Compile Errors when running test (held_suarez_test_case.py) #259

Closed CicadaDennis closed 11 months ago

CicadaDennis commented 12 months ago

Description

New installation of Isca on RHEL8 HPC system. I believe errors may relate to versions of packages/modules that are installed by conda and pip. It is also possible that there is some sort of configuration issue on my side that is incorrect, but I have not been able to figure out a way to get it to work at this point. 1st issue: If I try using the compiler that is downloaded into the conda environment, it uses gfortran, which does not seem to work at all with the Lsca. When I use the Intel compiler on our system, I get: Deprecated compiler option flags. When conda builds its environment mpiifort on our RHEL8 system, the following options are flagged as deprecated.

2023-11-06 17:21:52,121 - isca - WARNING - ifort: command line warning #10434: option '-stack_temps' use with underscore is deprecated; use '-stack-temps' instead
2023-11-06 17:21:52,121 - isca - WARNING - ifort: command line warning #10434: option '-safe_cray_ptr' use with underscore is deprecated; use '-safe-cray-ptr' instead

Those commands do compile. But then, the following errors occur. It seems that the program is attempting to use a non-existent file (on our system): /usr/local/include/netcfd.inc, rather than the netcfd.inc in the conda environment:

2023-11-06 17:28:55,179 - isca - INFO - /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/diag_manager/diag_data.F90(179): error #7013: This module file was not generated by any release of this compiler.   [NETCDF]
2023-11-06 17:28:55,179 - isca - INFO - USE netcdf, ONLY: NF_FILL_REAL => NF90_FILL_REAL
2023-11-06 17:28:55,179 - isca - INFO - ------^
2023-11-06 17:28:55,179 - isca - INFO - /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/diag_manager/diag_data.F90(726): error #6592: This symbol must be a defined parameter, an enumerator, or an argument of an inquiry function that evaluates to a compile-time constant.   [NF_FILL_REAL]
2023-11-06 17:28:55,179 - isca - INFO - REAL :: FILL_VALUE = NF_FILL_REAL  ! from file /usr/local/include/netcdf.inc
2023-11-06 17:28:55,179 - isca - INFO - -----------------------^
2023-11-06 17:28:55,179 - isca - INFO - /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/diag_manager/diag_data.F90(726): error #6973: This is not a valid initialization expression.   [NF_FILL_REAL]
2023-11-06 17:28:55,179 - isca - INFO - REAL :: FILL_VALUE = NF_FILL_REAL  ! from file /usr/local/include/netcdf.inc
2023-11-06 17:28:55,179 - isca - INFO - -----------------------^
2023-11-06 17:28:55,179 - isca - INFO - /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/diag_manager/diag_data.F90(179): error #6581: Unresolved rename.   [NF_FILL_REAL]
2023-11-06 17:28:55,179 - isca - INFO - USE netcdf, ONLY: NF_FILL_REAL => NF90_FILL_REAL
2023-11-06 17:28:55,179 - isca - INFO - --------------------^
2023-11-06 17:28:55,184 - isca - INFO - compilation aborted for /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/diag_manager/diag_data.F90 (code 1)
2023-11-06 17:28:55,185 - isca - INFO - make: *** [Makefile:45: diag_data.o] Error 1
2023-11-06 17:28:55,185 - isca - INFO - ERROR: mkmf failed for held_suarez.x
Exception in thread background thread for pid 1286264:
Traceback (most recent call last):
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 1639, in wrap
    fn(*rgs, **kwargs)
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 2641, in background_thread
    handle_exit_code(exit_code)
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 2332, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 826, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/bin/bash /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/compile.sh

  STDOUT:
/N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/path_names
................................................................................................................................ Makefile is ready.
mpiifort -Duse_libMPI -Duse_netCDF -Duse_LARGEFILE -DINTERNAL_FILE_NML -DOVERLOAD_C8 -DRRTM_NO_COMPILE -DSOC_NO_COMPILE -I/usr/local/include  -I/usr/local/include -fpp -stack_temps -safe_cray_ptr -ftz -assume byterecl -shared-intel -i4 -r8 -g -O2 -diag-disable 6843 -mcmodel large  -c -I/N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/field_manager    /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/field_manage... (4488 more, please see e.stdout)

  STDERR:

Traceback (most recent call last):
  File "/N/scratch/scttest/Quartz/Isca/exp/test_cases/held_suarez/held_suarez_test_case.py", line 21, in <module>
    cb.compile()  # compile the source code to working directory $GFDL_WORK/codebase
  File "/N/scratch/scttest/Quartz/Isca/src/extra/python/isca/helpers.py", line 38, in _useworkdir
    return fn(*args, **kwargs)
  File "/N/scratch/scttest/Quartz/Isca/src/extra/python/isca/helpers.py", line 22, in _destructive
    return fn(*args, **kwargs)
  File "/N/scratch/scttest/Quartz/Isca/src/extra/python/isca/codebase.py", line 280, in compile
    for line in sh.bash(P(self.builddir, 'compile.sh'), _iter=True, _err_to_out=True):
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 882, in __next__
    self.wait()
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 799, in wait
    self.handle_command_exit_code(exit_code)
  File "/N/scratch/scttest/Quartz/lsca_cenv/lib/python3.9/site-packages/sh.py", line 826, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/bin/bash /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/compile.sh

  STDOUT:
/N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/path_names
................................................................................................................................ Makefile is ready.
mpiifort -Duse_libMPI -Duse_netCDF -Duse_LARGEFILE -DINTERNAL_FILE_NML -DOVERLOAD_C8 -DRRTM_NO_COMPILE -DSOC_NO_COMPILE -I/usr/local/include  -I/usr/local/include -fpp -stack_temps -safe_cray_ptr -ftz -assume byterecl -shared-intel -i4 -r8 -g -O2 -diag-disable 6843 -mcmodel large  -c -I/N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/field_manager    /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/shared/field_manage... (4488 more, please see e.stdout)

  STDERR:

srun: error: c8: task 0: Exited with exit code 1

Isca version This is Isca downloaded on 2023/11/3 using a git clone command.


System Information: RHEL8 running on Indiana University's Quartz HPC cluster, which features 64 AMD EPYC 7742 2.25 GHz CPUs per node (512G per node).

uname_result(system='Linux', node='h2.quartz.uits.iu.edu', release='4.18.0-477.15.1.el8_8.x86_64', version='#1 SMP Fri Jun 2 08:27:19 EDT 2023', machine='x86_64')

Minimal reproducible example

conda activate /N/scratch/scttest/Quartz/lsca_cenv export GFDL_BASE=/N/scratch/scttest/Quartz/Isca export GFDL_WORK=/N/scratch/scttest/Quartz/gfdl_work export GFDL_DATA=/N/scratch/scttest/Quartz/gfdl_data export GFDL_ENV=/N/scratch/scttest/Quartz/Isca/emps-gv export F90=mpiifort export CC=mpiicc

Currently Loaded Modules: 1) quota/1.8 2) xalt/2.10.34 3) StdEnv 4) intel/22.3 5) intel-mpi/2021.7.0 6) miniconda/4.12.0

We use Slurm, I ran the following: salloc --ntasks-per-node=1 -A staff --mem=32G --time=1:00:00 srun --ntasks=1 python held_suarez_test_case.py

sit23 commented 11 months ago

Hi @CicadaDennis - thanks for raising an issue. I think your problem will be linked to our mkmf template files, which tell the compiler where the netcdf installations are. The emps-gv environment file by default will use the .ia64 mkmf template file here: https://github.com/ExeClim/Isca/blob/master/src/extra/python/isca/templates/mkmf.template.ia64#L4 By default that points to /usr/local/include. My recommendation instead will be to use the ubuntu_conda environment file as a template: https://github.com/ExeClim/Isca/blob/master/src/extra/env/ubuntu_conda This will then use the ubuntu_conda mkmf template: https://github.com/ExeClim/Isca/blob/master/src/extra/python/isca/templates/mkmf.template.ubuntu_conda .

This will then use nc-config to find the location of your netcdf installation, which should hopefully fix your issue. I will try and update the installation instructions to reflect this.

CicadaDennis commented 11 months ago

That did change what error occurs. Now I have the error: 2023-11-22 11:42:30,341 - isca - WARNING - Environment variable GFDL_SOC not set, but this is only required if using SocratesCodebase. Setting to None 2023-11-22 11:42:31,385 - isca - INFO - RRTM compilation disabled. 2023-11-22 11:42:31,385 - isca - INFO - SOCRATES compilation disabled. 2023-11-22 11:42:31,391 - isca - INFO - Writing path_names to '/N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/path_names' 2023-11-22 11:42:31,416 - isca - INFO - Running compiler 2023-11-22 11:42:31,454 - isca - INFO - /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez/path_names 2023-11-22 11:42:32,805 - isca - INFO - ................................................................................................................................ Makefile is ready. 2023-11-22 11:42:33,086 - isca - INFO - mpifort -Duse_libMPI -Duse_netCDF -Duse_LARGEFILE -DINTERNAL_FILE_NML -DOVERLOAD_C8 -DRRTM_NO_COMPILE -DSOC_NO_COMPILE nc-config --cflags nc-config --cflags nc-config --flibs -cpp -fcray-pointer -O2 -ffree-line-length-none -fno-range-check -fdefault-real-8 -fdefault-double-8 -fallow-invalid-boz -fallow-argument-mismatch -c /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/code/src/atmos_spectral/model/global_integral.F90 2023-11-22 11:42:33,277 - isca - INFO - f951: Fatal Error: Reading module 'fms_mod.mod' at line 1 column 2: Unexpected EOF 2023-11-22 11:42:33,278 - isca - INFO - compilation terminated. 2023-11-22 11:42:33,293 - isca - INFO - make: *** [Makefile:87: global_integral.o] Error 1 2023-11-22 11:42:33,293 - isca - INFO - ERROR: mkmf failed for held_suarez.x

sit23 commented 11 months ago

OK - a couple of questions:

  1. Are you going to use the intel compilers or gfortran compilers? By default our requirements file from the ci folder will install gfortran compilers and netcdf compiled with gfortran from conda-forge. But I see you've got your intel modules imported. You'll probably want to start with using gfortran for consistency with the netcdf ones, and then you can swap to using intel compilers once that's working. But you'll then need to install netcdf and compile it using the intel compiler, which in my experience is doable but takes some time, as you first have to install and compile hdf5. So if I were you I'd start with the gfortran compilers and netcdf from conda and go from there.

  2. I don't see a reason there should be an unexpected EOF, other than an error on one of the previous compilation attempts. I would clear the build path first before trying the compilation again. So I'd remove everything that's in the /N/scratch/scttest/Quartz/gfdl_work/codebase/_N_scratch_scttest_Quartz_Isca/build/held_suarez folder before trying again.

Hopefully that helps.

CicadaDennis commented 11 months ago

Thanks. I did switch to using the gnu fortran that is in the conda environment, rather than using the fortran in the intel module. I have successfully compiled and started running after I removed that directory and reran the code. I am getting the following messages, which I do do not know if they are an issue, or just informational:

2023-11-22 14:11:47,023 - isca - DEBUG - [c1.quartz.uits.iu.edu:1919024] 15 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
2023-11-22 14:11:47,023 - isca - DEBUG - [c1.quartz.uits.iu.edu:1919024] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
sit23 commented 11 months ago

That's great that it works now. Glad to have been able to help. In terms of those error messages, these are not ones I've seen before, so I can't comment specifically. If the code runs without issue then I wouldn't worry about them too much!