E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
18 stars 16 forks source link

Hanging issue with ADIOS conversion tool #569

Closed dqwu closed 4 months ago

dqwu commented 4 months ago

This issue has been reproduced on ANL CELS GCE nodes with Ubuntu 20.

Steps to reproduce:

module load cmake/3.20.5-zyz2eld
module load gcc/11.1.0-qsjmpcg
export PATH=/nfs/gce/projects/climate/software/linux-ubuntu20.04-x86_64/mpich/4.0/gcc-11.1.0/bin:$PATH

git clone https://github.com/E3SM-Project/scorpio.git
cd scorpio

mkdir build
cd build

ADIOS2_DIR=/nfs/gce/projects/climate/software/linux-ubuntu20.04-x86_64/adios2/2.9.1/mpich-4.0/gcc-11.1.0 \
CC=mpicc CXX=mpicxx FC=mpifort cmake -Wno-dev \
-DWITH_ADIOS2=ON \
-DWITH_NETCDF=OFF \
-DPnetCDF_PATH=/nfs/gce/projects/climate/software/linux-ubuntu20.04-x86_64/pnetcdf/1.12.2/mpich-4.0/gcc-11.1.0 \
-DPIO_USE_MALLOC=ON \
..

make

cd tools/adios2pio-nm

cp -r /nfs/gce/projects/climate/scratch/F2010-SCREAMv1_ne4pg2_ne4pg2.elm.r.0001-01-01-07200.nc.bp ./

timeout 60 mpiexec -n 4 ./adios2pio-nm.exe --bp-file=F2010-SCREAMv1_ne4pg2_ne4pg2.elm.r.0001-01-01-07200.nc.bp

Besides hanging, the following warning message is also printed: WARNING: Skipping BP file (could not find/stat file) :"F2010-SCREAMv1_ne4pg2_ne4pg2.elm.r.0001-01-01-07200.nc.bp"

Issue 1: ADIOS conversion tool should not skip BP file in this case (conversion is expected). Issue 2: Whether conversion is performed or not, the hanging is not expected.

This issue seems to be a regression caused by PR #564.