E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
21 stars 16 forks source link

build fails with PIO_ENABLE_TIMING=off #494

Closed philipwjones closed 1 year ago

philipwjones commented 1 year ago

When building with PIO_ENABLE_TIMING=off, the build fails trying to include gptl in the spio_finfo tool. Since it is properly surrounded with the timing ifdefs in that tool, it seems like CMake may not be propagating the timing ifdefs into that subdirectory.

Can build with the internal gptl interface, but if later building with a gptl enabled code, the compiler/linker will pick up the wrong includes/lib from the build

jayeshkrishna commented 1 year ago

@philipwjones : How are you building SCORPIO (can you copy-paste the cmake configure line)? Is it a standalone build of SCORPIO or SCORPIO being built as part of an E3SM build?

philipwjones commented 1 year ago

It's a standalone build (to be used for a standalone MPAS build). Pretty generic config line - tried on a few machines to make sure it wasn't a local issues. The crusher version was: CC=cc CXX=CC FC=ftn cmake -DCMAKE_Fortran_FLAGS=" -ef " -DPIO_ENABLE_TIMING=OFF -DCMAKE_INSTALL_PREFIX=$PROJWORK/cli115/pwjones/crusherlibs-cray/scorpio -DNetCDF_FortranPATH=${NETCDF DIR} ~/scorpio so this particular one was with cray wrappers and the ef flag was necessary to have the cray compiler use a lower-case .mod suffix for Fortran module files. On summit, the version was CC=mpicc FC=mpif90 cmake -DCMAKE_INSTALL_PREFIX=~/summitlibs/pio -DPIO_ENABLE_TIMING=OFF -DNetCDF_Fortran_PATH=${OLCF_NETCDF_FORTRAN_ROOT} ~/scorpio

philipwjones commented 1 year ago

sorry about the formatting - looks like a character turned on strikethrough...

jayeshkrishna commented 1 year ago

I am still not able to recreate the issue, the build script below builds SCORPIO (latest master) successfully on Summit,

#!/bin/tcsh
source /sw/summit/lmod/8.4/init/csh

module purge
module load DefApps
module load python/3.7-anaconda3
module load subversion/1.14.0
module load git/2.31.1
module load cmake/3.20.2
module load essl/6.3.0
module load netlib-lapack/3.8.0

module load gcc/9.1.0

module load spectrum-mpi/10.4.0.3-20210112
module load hdf5/1.10.7
module load netcdf-c/4.8.0
module load netcdf-fortran/4.4.5
module load parallel-netcdf/1.12.2

setenv NETCDF_C_PATH ${OLCF_NETCDF_C_ROOT}
setenv NETCDF_FORTRAN_PATH ${OLCF_NETCDF_FORTRAN_ROOT}
setenv ESSL_PATH ${OLCF_ESSL_ROOT}
setenv HDF5_PATH ${OLCF_HDF5_ROOT}
setenv PNETCDF_PATH ${OLCF_PARALLEL_NETCDF_ROOT}

setenv CC mpicc
setenv CXX mpiCC
setenv FC mpif90

rm -rf /ccs/home/jayesh/acme/scorpio_install/*

cmake \
-DPIO_BUILD_STATIC_LIBS=TRUE \
-DWITH_PNETCDF:BOOL=TRUE \
-DCMAKE_INSTALL_PREFIX:PATH=/ccs/home/jayesh/acme/scorpio_install \
-DBUILD_SHARED_LIBS:BOOL=OFF \
-DPIO_ENABLE_FORTRAN:BOOL=ON \
-DCMAKE_BUILD_TYPE=Debug \
-DPIO_ENABLE_TIMING:BOOL=OFF \
-DPnetCDF_PATH=$PNETCDF_PATH \
-DNetCDF_C_PATH=$NETCDF_C_PATH \
-DNetCDF_Fortran_PATH=$NETCDF_FORTRAN_PATH \
-DHDF5_PATH=$HDF5_PATH \
-LH \
/ccs/home/jayesh/acme/scorpio |& tee configure.log

make VERBOSE=1 |& tee make.log
make VERBOSE=1 install |& tee install.log

Can you try out the script above after modifying the paths to the source/install (/ccs/home/jayesh/acme/scorpio*)?

jayeshkrishna commented 1 year ago

A shorter cmake configure line also should work,

cmake \
-DCMAKE_INSTALL_PREFIX:PATH=/ccs/home/jayesh/acme/scorpio_install \
-DPIO_ENABLE_TIMING:BOOL=OFF \
-DPnetCDF_PATH=$PNETCDF_PATH \
-DNetCDF_C_PATH=$NETCDF_C_PATH \
-DNetCDF_Fortran_PATH=$NETCDF_FORTRAN_PATH \
-DHDF5_PATH=$HDF5_PATH \
-LH \
/ccs/home/jayesh/acme/scorpio |& tee configure.log
jayeshkrishna commented 1 year ago

The following script worked for me on crusher,

#!/bin/tcsh
source /usr/share/lmod/lmod/init/csh

module reset
module switch PrgEnv-cray PrgEnv-gnu/8.3.3
module load cray-python/3.9.12.1
module load subversion/1.14.1
module load git/2.36.1
module load cmake/3.21.3
module load cray-hdf5-parallel/1.12.1.1
module load cray-netcdf-hdf5parallel/4.8.1.1
module load cray-parallel-netcdf/1.12.1.7

setenv NETCDF_C_PATH ${NETCDF_DIR}
setenv NETCDF_FORTRAN_PATH ${NETCDF_DIR}
setenv HDF5_PATH ${HDF5_DIR}
setenv PNETCDF_PATH ${PNETCDF_DIR}

setenv CC cc
setenv CXX CC
setenv FC ftn

rm -rf /ccs/home/jayesh/acme/scorpio_install/*

cmake \
-DCMAKE_INSTALL_PREFIX:PATH=/ccs/home/jayesh/acme/scorpio_install \
-DPIO_ENABLE_TIMING:BOOL=OFF \
-DPnetCDF_PATH=$PNETCDF_PATH \
-DNetCDF_C_PATH=$NETCDF_C_PATH \
-DNetCDF_Fortran_PATH=$NETCDF_FORTRAN_PATH \
-DHDF5_PATH=$HDF5_PATH \
-LH \
/ccs/home/jayesh/acme/scorpio |& tee configure.log
make VERBOSE=1 |& tee make.log
make VERBOSE=1 install |& tee install.log
dqwu commented 1 year ago

@philipwjones In your description, I only see NetCDF_Fortran_PATH is used to configure SCORPIO. Do you use PnetCDF and/or NetCDF C?

I tried the following commands on Summit:

module load cmake

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

CC=mpicc CXX=mpiCC FC=mpif90 cmake -Wno-dev \
-DPIO_ENABLE_TIMING=OFF \
-DWITH_PNETCDF=OFF \
-DNetCDF_Fortran_PATH=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-fortran-4.4.5-wmedvlkliwpllcswp3dmqz5wq2ntahuc \
-DPIO_USE_MALLOC=ON \
-DPIO_ENABLE_TESTS=ON \
-DPIO_ENABLE_EXAMPLES=ON \
..

make

CMake output failed with messages like below.

CMake Error at src/clib/CMakeLists.txt:164 (message):
  Could not find PnetCDF and NetCDF libraries.  SCORPIO requires PnetCDF
  and/or NetCDF C libraries
philipwjones commented 1 year ago

I usually have those paths set in my overall environment so not needed on the specific invocation here. They just point to the module installed environments on each machine.

dqwu commented 1 year ago

I usually have those paths set in my overall environment so not needed on the specific invocation here. They just point to the module installed environments on each machine.

Could you please share your CMake output on Summit?

Also, you can try shorter CMake configure line like below (with explicit paths) to see if your issue is still reproducible on Summit.

CC=mpicc CXX=mpiCC FC=mpif90 cmake -Wno-dev \
-DPIO_ENABLE_TIMING=OFF \
-DWITH_PNETCDF=OFF \
-DNetCDF_C_PATH=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-c-4.8.0-dkuu4se5zxpdmm6ah4ckfgdvp2anserh \
-DNetCDF_Fortran_PATH=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-fortran-4.4.5-wmedvlkliwpllcswp3dmqz5wq2ntahuc \
-DPIO_USE_MALLOC=ON \
-DPIO_ENABLE_TESTS=ON \
-DPIO_ENABLE_EXAMPLES=ON \
..
dqwu commented 1 year ago

@philipwjones FYI, my CMake output (configured without PnetCDF) is

-- The C compiler identification is XLClang 16.1.1.10
-- The CXX compiler identification is XLClang 16.1.1.10
-- The Fortran compiler identification is XL 16.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpiCC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpif90 - skipped
-- ===== Configuring SCORPIO... =====
-- Enabling SCORPIO I/O performance statistics collection (default)
-- Using malloc to allocate memory for caching data in SCORPIO (default)
-- Disabling debug logging in SCORPIO (default)
-- Disabling use/check of the MPI serial library (default)
-- Disabling saving I/O decompositions (default)
-- No limit on the number of cached I/O regions (default)
-- Limit on the number of Lustre OSTs, PIO_MAX_LUSTRE_OSTS, is not set (default)
-- Using filesystem striping unit, PIO_STRIPING_UNIT = 16777216 (default for OLCF machines)
-- Using PnetCDF independent data mode to read variables in SCORPIO (default)
-- Reserving some extra space in the header when creating NetCDF files, requested bytes = 10240 (default)
-- Configurable parameters used by ADIOS type are not applicable (default)
CMake Warning at CMakeLists.txt:292 (message):
  C++11 regex support is disabled since some versions of the compiler(XL) on
  Summit/Compy do not support it

-- Disabling code coverage... (use -DPIO_ENABLE_COVERAGE:BOOL=ON to enable coverage, only GNU is supported for now)
-- ===== Configuring SCORPIO C library... =====
-- Found MPI_C: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpicc (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: C 
-- Found NetCDF_C: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-c-4.8.0-dkuu4se5zxpdmm6ah4ckfgdvp2anserh/lib/libnetcdf.so  
-- Checking NetCDF version
-- Checking NetCDF version - 4.8.0./*!<
-- Found NetCDF: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-c-4.8.0-dkuu4se5zxpdmm6ah4ckfgdvp2anserh/lib/libnetcdf.so (found suitable version "4.8.0./*!<", minimum required is "4.3.3") 
-- Checking whether NetCDF has parallel support
-- Checking whether NetCDF has parallel support - yes
-- Looking for nc_set_log_level
-- Looking for nc_set_log_level - found
-- Looking for nc__enddef
-- Looking for nc__enddef - found
-- NetCDF C library dependencies: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-c-4.8.0-dkuu4se5zxpdmm6ah4ckfgdvp2anserh/lib/libnetcdf.so
-- Disabling support for PnetCDF
-- Disabling support for ADIOS (default)
-- Disabling support for HDF5
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of MPI_Offset
-- Check size of MPI_Offset - done
-- Checking whether Fortran type INTEGER(KIND=MPI_OFFSET_KIND) has the same size as INTEGER(KIND=C_LONG_LONG)
-- Using MPI_Offset for PIO Offset. sizeof(PIO_Offset) = 8 bytes
-- Check size of size_t
-- Check size of size_t - done
-- sizeof(size_t) = 8 bytes
-- Check size of long long
-- Check size of long long - done
-- sizeof(long long) = 8 bytes
-- Check size of MPI_Offset
-- Check size of MPI_Offset - done
-- sizeof(MPI_Offset) = 8 bytes
-- Disabling compiler optimization for pioc_support.c (to prevent internal compiler error with the XL compiler)
-- Disabling Micro timers... (default, use -DPIO_MICRO_TIMING:BOOL=ON to enable micro timers)
-- Enabling the Fortran interface...
-- ===== Configuring SCORPIO Fortran interface... =====
-- Checking whether the Fortran compiler supports c_sizeof
-- Checking whether the Fortran compiler supports c_sizeof - yes
-- Using internal version of genf90 tool
-- Found MPI_Fortran: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpif90 (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components: Fortran 
-- Checking whether MPI Fortran module is supported
-- Checking whether MPI Fortran module is supported - yes
-- MPI Fortran module verified and enabled.
-- Found NetCDF_Fortran: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-fortran-4.4.5-wmedvlkliwpllcswp3dmqz5wq2ntahuc/lib/libnetcdff.so  
-- NetCDF Fortran library dependencies: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/netcdf-fortran-4.4.5-wmedvlkliwpllcswp3dmqz5wq2ntahuc/lib/libnetcdff.so
-- Disabling support for PnetCDF
-- Enabling SCORPIO tools... (default, use -DPIO_ENABLE_TOOLS:BOOL=OFF to disable tools)
-- ===== Configuring SCORPIO File info tool... =====
-- Found MPI_CXX: /sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/xl-16.1.1-10/spectrum-mpi-10.4.0.3-20210112-dzedzfvocsuzkm4jkqe7o64x53yhq7nm/bin/mpiCC (found version "3.1") 
-- Found MPI: TRUE (found version "3.1")  
-- Enabling SCORPIO tests... 
-- ===== Configuring SCORPIO Fortran tests... =====
-- MPI Fortran module verified and enabled.
-- Adding Fortran tests with [1...4] MPI processes
-- ===== Configuring SCORPIO legacy Fortran tests... =====
-- MPI Fortran module verified and enabled.
-- ===== Configuring SCORPIO Performance tests/tools ... =====
-- ===== Configuring SCORPIO C/C++ tests... =====
-- Enabling SCORPIO Examples...
-- ===== Configuring SCORPIO Examples... =====
-- MPI Fortran module verified and enabled.
-- Could NOT find MPE_C (missing: MPE_C_LIBRARY MPE_C_INCLUDE_DIR) 
-- MPI Fortran module verified and enabled.
-- Enabling SCORPIO Documentation...
-- ===== Configuring SCORPIO Documentation... =====
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
-- Configuring done
-- Generating done
philipwjones commented 1 year ago

@dqwu The problem shows up during the actual build, not during CMake. The CMake invocation is typically successful and the build proceeds until the very end when trying to build the spio_finfo tool. I was just speculating that CMake wasn't propagating the environment variables down for the local makefile for spio_finfo

philipwjones commented 1 year ago

So on Summit (with an Nvidia build), I get: [ 98%] Building CXX object tools/spio_finfo/CMakeFiles/spio_finfo.exe.dir/spio_finfo_tool.cxx.o "/ccs/home/pwjones/scorpio/tools/spio_finfo/spio_finfo_tool.cxx", line 4: catastrophic error: cannot open source file "gptl.h"

include

               ^

But this include is inside an ifdef TIMING so it should never be invoked if PIO_ENABLE_TIMING=OFF and gptl doesn't exist

dqwu commented 1 year ago

So on Summit (with an Nvidia build), I get: [ 98%] Building CXX object tools/spio_finfo/CMakeFiles/spio_finfo.exe.dir/spio_finfo_tool.cxx.o "/ccs/home/pwjones/scorpio/tools/spio_finfo/spio_finfo_tool.cxx", line 4: catastrophic error: cannot open source file "gptl.h" #include ^ But this include is inside an ifdef TIMING so it should never be invoked if PIO_ENABLE_TIMING=OFF and gptl doesn't exist

Is it reproducible with IBM XL compiler or GNU compiler?

philipwjones commented 1 year ago

It happens the same way on Crusher with Cray and Intel on some local LANL machines. This appears to be in the Make system and not specific to a compiler.

dqwu commented 1 year ago

@philipwjones Jayesh and I are not able to reproduce this build issue on Summit or Crusher so far (we do not have build errors on tools/spio_finfo).

Based on your description, your CMake configure line on Crusher is something like below. Is that correct?

CC=cc CXX=CC FC=ftn cmake \
-DCMAKE_Fortran_FLAGS=" -ef " \
-DPIO_ENABLE_TIMING=OFF \
-DCMAKE_INSTALL_PREFIX=$PROJWORK/cli115/pwjones/crusherlibs-cray/scorpio \
-DNetCDF_Fortran_PATH=${NETCDF_DIR}

Could you please provide your CMake configure line on Summit as well?

dqwu commented 1 year ago

@philipwjones What confused me is, with your CMake configure line on Crusher (without setting -DNetCDF_C_PATH=XXXX in the cmake command line, and without setting -DPnetCDF_PATH=XXXX), CMake configuration is expected to fail with the following messages:

CMake Error at src/clib/CMakeLists.txt:164 (message):
  Could not find PnetCDF and NetCDF libraries.  SCORPIO requires PnetCDF
  and/or NetCDF C libraries

However, the CMake invocation is typically successful for you. That is why I would like to see your CMake output on Crusher, to verify if it still finds PnetCDF and/or NetCDF C.

It will be helpful for debugging if we can reproduce this build issue with your CMake configure line on Crusher or Summit.

dqwu commented 1 year ago

@philipwjones Maybe you can try to reproduce the build issue with a CMake configure command line (followed by a make command) consists of explicit -DXXXX_PATH=XXXX paths (rather than implicit paths set in your overall environment). You can see that Jayesh's example scripts all use explicit -DXXXX_PATH=XXXX paths in the cmake command line.

philipwjones commented 1 year ago

Aargh. At some point (I think while debugging an unrelated gptl issue?) I had apparently switched to the latest tagged release. A fresh checkout does not exhibit this problem, so apparently it's fixed on the latest master. I also didn't need the fortran path at all anyway so it's picking up my environment just fine.

Sorry for wasting your time...everything's good now.

jayeshkrishna commented 1 year ago

Np, glad to know its working for you now.