Closed xylar closed 1 month ago
This is just a draft so far. I'm having no luck with either gnu or intel on Chicoma-CPU so far. I haven't tried anything else yet.
I've contacted LANL IC about the trouble I'm having with gnu:
/lustre/scratch5/xylar/E3SM/scratch/chicoma-cpu/SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu.20241011_152619_r3zumf/bld/e3sm.exe: /opt/cray/pe/gcc-libs/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /lustre/scratch5/xylar/E3SM/scratch/chicoma-cpu/SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu.20241011_152619_r3zumf/bld/e3sm.exe)
While it seems clear that there's an RPATH being set to /opt/cray/pe/gcc-libs
, I haven't been able to track down where that's coming from. Setting the LD_LIBRARY_PATH
didn't help.
On the intel side, it's not finding NetCDF-C or -Fortran, even though we're passing a NETCDF_PATH
environment variable that seems correct.
The gnu issue seems similar to https://github.com/E3SM-Project/E3SM/issues/6677
With the commits I just pushed, I was able to successfully build and run:
So I think at this point we can say we support gnu on chicoma. I'll poke around at intel as well
@jonbob, I'm trying to run a test:
./create_test SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu --walltime 00:30:00 --wait -p w23_freddy
This looks to be what you ran successfully. But for me it just seems to be hanging. It hasn't got to ocean time stepping yet and there's very little output in the e3sm
log file.
Could you have a quick look and let me know if you see anything obvious?
/users/xylar/scratch5/E3SM/scratch/chicoma-cpu/SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu.20241017_103208_7tp6zr
@xylar -- let me take a peek
@xylar - it seems to be struggling with the atm data? That doesn't make much sense
In the meantime, I'm trying an optimized run to see how that goes.
SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu
passed for me in the end. It just took 25 minutes and didn't get to time stepping for a long time. It seems like it might be a file system issue with /usr/projects/e3sm
.
SMS.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu
passed for me as well.
I tested SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_nvidia
and it built fine and appeared to be running but timed out before the 30 minutes I gave it (same file system issues as above). Waiting in the queue with a longer test.
I realize it's not a high priority for us but SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_nvidia
passed for me with a longer job runtime.
@jonbob, at the risk of delaying this further, I think we probably want to follow what Noel is doing on Perlmutter: https://github.com/E3SM-Project/E3SM/pull/6702/files That should at least save us from having to make yet another PR in the near future.
I have gnu
and nvidia
tests in the queue with the latest updates.
The following both passed:
SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_gnu
SMS_D.TL319_oQU240wLI_ais8to30.MPAS_LISIO_JRA1p5.chicoma-cpu_nvidia
Closed in favor of https://github.com/E3SM-Project/E3SM/pull/6705
Following the recent DST, this merge updates the module files and environment variables on Chicoma-CPU. We note that these updates work well for
gnu
andnvidia
compilers but not yet forintel
, which we are continuing to work on. A separate update will be needed to address Chicoma-GPU as well.