Closed ehackert closed 4 years ago
I've edited the original ticket to fix a typo in the paths and make the output references a bit easier to read. Here I'll include snippets of the actual error messages.
The first error log shows
mpirun -np $NP $UMD_LETKFUTILS/ocean_sponge.py $yyyy $mm $dd > ocean_sponge.out
rm temp_salt_sponge.nc
rm: cannot remove `temp_salt_sponge.nc': No such file or directory
$UMD_LETKFUTILS/ocean_iau.x -DO_SPONGE ${ODAS_dt_restore_sst}
/gpfsm/dnb42/projects/p17/ehackert/geos5/exp/eh018/ocean_das/UMD_Etc/UMD_utils/\
/ocean_iau.x: symbol lookup error: /gpfsm/dnb42/projects/p17/ehackert/geos5/exp\
/eh018/ocean_das/UMD_Etc/UMD_utils//ocean_iau.x: undefined symbol: mpi_sgi_inpl\
ace
cp temp_salt_sponge.nc $SCRDIR/INPUT/
cp: cannot stat `temp_salt_sponge.nc': No such file or directory
ln -s temp_salt_sponge.nc $SCRDIR/INPUT/temp_sponge_coeff.nc
ln -s temp_salt_sponge.nc $SCRDIR/INPUT/temp_sponge.nc
Further in on the 2nd output this type of error shows up:
@ NPES = $NX * $NY
$RUN_CMD $NPES ./GEOSgcm.x | tee geos.out
./GEOSgcm.x: error while loading shared libraries: libmpi++abi1002.so: cannot o\
pen shared object file: No such file or directory
./GEOSgcm.x: error while loading shared libraries: libmpi++abi1002.so: cannot o\
pen shared object file: No such file or directory
First things first, yes, I believe some/all of this is related to g5_modules
. The first experiment you pointed me to was built with Intel MPI according to this:
/gpfsm/dnb42/projects/p17/ehackert/geos5/sandbox_try4/GEOSodas/src/g5_modules
You definitely can't use an Intel MPI g5_modules
with MPT executables and vice-versa.
Second, you'll want to look over your scripts for references to mpirun
. With MPT, mpirun
does Very Weird Things™. The easiest solution is to use esma_mpirun
from the installation binary directory as it tries to auto-detect your MPI stack and use the right command. This is how things are done now, but your jobs seem to be from Heracles(?) days. At that point we hadn't quite gotten as general. You might have (in gcm_run.j
):
setenv RUN_CMD "mpirun -np"
Now we do:
setenv RUN_CMD "$GEOSBIN/esma_mpirun -np"
In your testing, if your code was compiled with MPT, you'd at least need to use:
setenv RUN_CMD "mpiexec_mpt -np"
I do see one other possible excitement:
mpirun -np $NP $UMD_LETKFUTILS/ocean_sponge.py $yyyy $mm $dd > ocean_sponge.out
Are you using mpi4py
? Because support for that is tricky.
Hi Matt, Thanks for taking a look at this. I made your suggested corrections, fixing all instances of the old g5_modules. In addition I put in the suggested mpi run correction as well for all scripts. I am submitting this now. Thanks.
eric
From: Matthew Thompson [mailto:notifications@github.com] Sent: Tuesday, July 30, 2019 8:33 AM To: GEOS-ESM/GEOSgcm GEOSgcm@noreply.github.com Cc: Hackert, Eric C. (GSFC-6101) eric.c.hackert@nasa.gov; Author author@noreply.github.com Subject: [EXTERNAL] Re: [GEOS-ESM/GEOSgcm] Testing S2S ODAS (#25)
First things first, yes, I believe some/all of this is related to g5_modules. The first experiment you pointed me to was built with Intel MPI according to this:
/gpfsm/dnb42/projects/p17/ehackert/geos5/sandbox_try4/GEOSodas/src/g5_modules
You definitely can't use an Intel MPI g5_modules with MPT executables and vice-versa.
Second, you'll want to look over your scripts for references to mpirun. With MPT, mpirun does Very Weird Things™. The easiest solution is to use esma_mpirun from the installation binary directory as it tries to auto-detect your MPI stack and use the right command. This is how things are done now, but your jobs seem to be from Heracles(?) days. At that point we hadn't quite gotten as general. You might have (in gcm_run.j):
setenv RUN_CMD "mpirun -np"
Now we do:
setenv RUN_CMD "$GEOSBIN/esma_mpirun -np"
In your testing, if your code was compiled with MPT, you'd at least need to use:
setenv RUN_CMD "mpiexec_mpt -np"
I do see one other possible excitement:
mpirun -np $NP $UMD_LETKFUTILS/ocean_sponge.py $yyyy $mm $dd > ocean_sponge.out
Are you using mpi4py? Because support for that is tricky.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GEOS-2DESM_GEOSgcm_issues_25-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAHD4H2X2QYEJ2COKQT7OZT3QCAYIRA5CNFSM4IHVQ5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DZ6GY-23issuecomment-2D516398875&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=yb2yVlsWRJ_V5YW1Cxf7FHw-jq0WLSHctPC4KYrpUmM&m=ugq4J9GwLk3Y_mDu3RnD4zmkzfGxlvkbx_RNLa1_dN8&s=QT3zTvytetAwHo40Z75Z5u5obOIgBHYETNIuvpncrho&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHD4H2WOTMVC3U272VEQ23TQCAYIRANCNFSM4IHVQ5HQ&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=yb2yVlsWRJ_V5YW1Cxf7FHw-jq0WLSHctPC4KYrpUmM&m=ugq4J9GwLk3Y_mDu3RnD4zmkzfGxlvkbx_RNLa1_dN8&s=NFlxDSGuEssm5vAna8LFDMMNN_Zfwek9-Vev3uUIrRA&e=.
Hi Matt,
I tried replacing all instances of g5_modules with the Git version. In addition, I replaced all the mpi commands with your suggestions. Finally, I replaced the read_merra2_bcs.so with the Git version since it looked like it was choking there. Now the code is complaining about a plotting call in ocean_sponge.py (see eh018.e 33728536 for details). Also the model is bombing immediately. Any help you can suggest would be appreciated. Thanks.
Eric
From: Matthew Thompson [mailto:notifications@github.com] Sent: Tuesday, July 30, 2019 8:33 AM To: GEOS-ESM/GEOSgcm GEOSgcm@noreply.github.com Cc: Hackert, Eric C. (GSFC-6101) eric.c.hackert@nasa.gov; Author author@noreply.github.com Subject: [EXTERNAL] Re: [GEOS-ESM/GEOSgcm] Testing S2S ODAS (#25)
First things first, yes, I believe some/all of this is related to g5_modules. The first experiment you pointed me to was built with Intel MPI according to this:
/gpfsm/dnb42/projects/p17/ehackert/geos5/sandbox_try4/GEOSodas/src/g5_modules
You definitely can't use an Intel MPI g5_modules with MPT executables and vice-versa.
Second, you'll want to look over your scripts for references to mpirun. With MPT, mpirun does Very Weird Things™. The easiest solution is to use esma_mpirun from the installation binary directory as it tries to auto-detect your MPI stack and use the right command. This is how things are done now, but your jobs seem to be from Heracles(?) days. At that point we hadn't quite gotten as general. You might have (in gcm_run.j):
setenv RUN_CMD "mpirun -np"
Now we do:
setenv RUN_CMD "$GEOSBIN/esma_mpirun -np"
In your testing, if your code was compiled with MPT, you'd at least need to use:
setenv RUN_CMD "mpiexec_mpt -np"
I do see one other possible excitement:
mpirun -np $NP $UMD_LETKFUTILS/ocean_sponge.py $yyyy $mm $dd > ocean_sponge.out
Are you using mpi4py? Because support for that is tricky.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GEOS-2DESM_GEOSgcm_issues_25-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAHD4H2X2QYEJ2COKQT7OZT3QCAYIRA5CNFSM4IHVQ5H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DZ6GY-23issuecomment-2D516398875&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=yb2yVlsWRJ_V5YW1Cxf7FHw-jq0WLSHctPC4KYrpUmM&m=ugq4J9GwLk3Y_mDu3RnD4zmkzfGxlvkbx_RNLa1_dN8&s=QT3zTvytetAwHo40Z75Z5u5obOIgBHYETNIuvpncrho&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHD4H2WOTMVC3U272VEQ23TQCAYIRANCNFSM4IHVQ5HQ&d=DwMFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=yb2yVlsWRJ_V5YW1Cxf7FHw-jq0WLSHctPC4KYrpUmM&m=ugq4J9GwLk3Y_mDu3RnD4zmkzfGxlvkbx_RNLa1_dN8&s=NFlxDSGuEssm5vAna8LFDMMNN_Zfwek9-Vev3uUIrRA&e=.
@ehackert
Looking at /gpfsm/dnb42/projects/p17/ehackert/geos5/exp/eh018/eh018.o33728536
my guess is that it's because what ever CAP.rc.tmpl
or the like you are using still has GCS as the root:
MAPLROOT_COMPNAME: GCS
ROOT_NAME: GCS
This was changed when we moved to Github to be GCM
:
MAPLROOT_COMPNAME: GCM
ROOT_NAME: GCM
Try making that change and things might go farther.
Tom asked me to test out the GITHUB version of the ODAS code. So far I set up a run where I pulled the GITHUB versions to the directory structure that the old CVS version followed. This experiment is located in
For the 1st pass through, I reused the model (GEOSgcm.x) from the CVS version. The error file is located in
More recently I copied the GEOSgcm.x from the GITHUB version and reran. These results are located in
Tom suggested that I change out the g5_modules in the shell so I will run this test to see if some (or all) of the errors are gone. I'll let you know the outcome.
Thanks.
eric