Closed edwardhartnett closed 3 years ago
This is entirely on us in HPC-STACK. Make decisions on how we will build the various dependencies and what we will set in their modules, stick to them and document and publish them. I've mentioned this before in many other discussion threads.
Not sure where HDF5_LDFLAGS
came from.
The test version, it appears, was using shared libraries. On NOAA HPC systems we build static libraries so just linking to NetCDF isn't sufficient.
I would do -L$HDF5_ROOT/lib -lhdf5_hl -lhdf5
and that will work
HDF5_LDFAGS is an environment variable for hdf5 installed at WCOSS operational library site. It is up to EIB's decision for setting or no setting this variable. It is important to our downstream users that the hpc-stack has stable and consistent installation configuration so that we could reduce a lot of modification with the stack-hpc upgrading.
-L$HDF5_ROOT/lib -lhdf5_hl -lhdf5
will work now and forever.
It's unfortunate that the test version was using shared libs and it broke your build because we specifically use static libraries on NOAA systems.
@WenMeng-NOAA is the problem you are having with the legacy build system for UPP? In other words, not the CMake build?
Yes, that's the problem for legacy build system. I may test with cmake build later.
My understanding is that the legacy build system will be retired. When will that occur?
Please advise the fix. Thanks!
@edwardhartnett Retiring GNC build capacity is my to-do list after we complete switching the UPP dependency libraries to the hpc-stack. A lot of UPP developers have been relying on GNC build capacity right now. I would like to give them a smooth transition.
@WenMeng-NOAA
You need to also link to zlib. -L$ZLIB_ROOT -lz
after HDF5
I set option as: -L$(NETCDF)/lib -lnetcdff -lnetcdf -L${HDF5_ROOT}/lib -lhdf5_hl -lhdf5 -L${ZLIB_ROOT} -lz
Now the executable was successfully built.
Another issue comes out. The environment variable CRTM_FIX in crtm/2.3.0 module is required for runtime. @Hang-Lei-NOAA Can you add it?
What does CRTM_FIX point to?
CRTM_FIX should point to fix files directory of crtm. Also CRTM_SRC which points to source code directory is needed. These two environment variables are important for debugging UPP for the issues of simulated satellite radiance, If you look at crtm module installed at WCOSS operational site or non-hpc-stack libraries on hera, these two variables are set.
@WenMeng-NOAA are the issues you are having with using hpc-stack resolved? I am assuming these are problems only when trying to build UPP as a stand alone code and the in line post library itself is building ok since it uses cmake?
@WenMeng-NOAA The crtm_fix is not included in the hpc-stack installation. It is not a standard hpc-stack solution. Therefore, I cannot set up the variable. You can include the crtm_fix files as a part of your code, as a solution.
@arunchawla-NOAA Yes, the issues I reported are from the UPP standalone tests. I would assume the in-line post is fine.
@Hang-Lei-NOAA The crtm library stalled under the hpc-stack without including fix files and source code path doesn't make sense to me. With more NCEP applications adapt the hps-stack, they would send the same requests as the UPP. @arunchawla-NOAA In the future, will the hpc-stack be installed at WCOSS NCO operational library site?
@WenMeng-NOAA The fix files (crtm coefficients) are also used in the GSI. NCEPlibs does not install these, because they are not part of the emc_crtm repository. In order to add that to the module file, the path which is currently machine specific, needs to be made generic.
Why does the UPP need crtm source code path? Is it required for compiling? Then why not use the compiled crtm library?
@aerorahul The UPP compiling and runtime doesn't need crtm source code. The crtm source path would helpful for debugging the issues of the UPP generating simulated satellite radiance process. We get several cases for tracking back in the crtm code.
CRTM has been an issue for years. The problem with it is that the binary files are large, 4GB and source repositories cannot handle it. The github limit is 2gb and it also gave vlab indigestion. Because of CRTM I never turned my old tarball NCEPLIBS into a repository object. I kept a bunch of CRTM versions and the total distribution added up to 20+ gbytes. Instead the tarball is on HPSS. But I do have source and fix. THey are in $PKG/src and $PKG/fix where $PKG is $NCEPLIBS/crtm/crtm$VERSION. For example
/gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/fix /gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/src
Due to NCO conventions in place when I first snagged crtm and tried to make it portable, the library is /gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/intel/libcrtm_v2.3.0.a and includes are /gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/intel/include/crtm_v2.3.0
That's for a tarball snapshot I built for luna/surge in July prior to HPC-STACK.
Basically we need a way to maintain the huge database of crtm binary files since they won't fit in github and a crtm installation is indeed not complete without them. My way of keeping them in a tarball is not really satisfactory either
For source, we need to preserve the source directory in $PKG/build or wherever cmake puts it, after build. This is generally a good idea anyway if you ever expect to need to follow a stack trace back to source. Why do we need to explicitly specify a source directory when running a CRTM code if we are debugging through stack traces?
OR does CRTM have it's own diagnostics that detect runtime issues and point to source lines?
The notation two comments up of crtm$VERSION is incorrect. It's just $VERSION without a crtm prefix
Technically, it is not a problem, since we can use ftp to store large files. But the importance is a decision making on whether hpc-stack will handle it or collect/distribute fix files by models.
On Thu, Dec 3, 2020 at 9:00 AM GeorgeVandenberghe-NOAA < notifications@github.com> wrote:
CRTM has been an issue for years. The problem with it is that the binary files are large, 4GB and source repositories cannot handle it. The github limit is 2gb and it also gave vlab indigestion. Because of CRTM I never turned my old tarball NCEPLIBS into a repository object. I kept a bunch of CRTM versions and the total distribution added up to 20+ gbytes. Instead the tarball is on HPSS. But I do have source and fix. THey are in $PKG/src and $PKG/fix where $PKG is $NCEPLIBS/crtm/crtm$VERSION. For example
/gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/fix /gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/src
Due to NCO conventions in place when I first snagged crtm and tried to make it portable, the library is
/gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/intel/libcrtm_v2.3.0.a and includes are
/gpfs/dell2/emc/modeling/noscrub/cases/l0701/lc/lib/crtm/v2.3.0/intel/include/crtm_v2.3.0
That's for a tarball snapshot I built for luna/surge in July prior to HPC-STACK.
Basically we need a way to maintain the huge database of crtm binary files since they won't fit in github and a crtm installation is indeed not complete without them. My way of keeping them in a tarball is not really satisfactory either
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/119#issuecomment-738011366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFEYXN4FSIVT5AWEQSLSS6KVZANCNFSM4UK2Z47A .
See github issue for comments https://github.com/NOAA-EMC/hpc-stack/issues/119#
On Thu, Dec 3, 2020 at 8:55 AM WenMeng-NOAA notifications@github.com wrote:
@aerorahul https://github.com/aerorahul The UPP compiling and runtime doesn't need crtm source code. The crtm source path would helpful for debugging the issues of the UPP generating simulated satellite radiance process. We get several cases for tracking back in the crtm code.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/119#issuecomment-738008630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FQMHNHXEDBP3ZYBSFLSS6KEZANCNFSM4UK2Z47A .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
@aerorahul The UPP compiling and runtime doesn't need crtm source code. The crtm source path would helpful for debugging the issues of the UPP generating simulated satellite radiance process. We get several cases for tracking back in the crtm code.
@WenMeng-NOAA If you are using the source code for stack trace and debugging, why can you not use this reference: https://github.com/NOAA-EMC/EMC_crtm/tree/v2.3.0 From what I understand, you are simply trying to identify which line had an error (if debugging a failure). This is the CRTM source code that is being built and installed. No source code is preserved, unless they are being used to compile. It is not a standard practice.
I would argue that since crtm is a centralized stack installation used by multiple modeling systems, it should be a part of the stack and the installation should be complete, otherwise each modeling system will have to maintain the binary data and supply environment pointers to wherever it is leading to both duplication and confusion. We should use ftp or some other large file API to make it so.
@GeorgeVandenberghe-NOAA We are manually making snapshots and maintaining https://github.com/NOAA-EMC/EMC_crtm We should instead use https://github.com/NOAA-EMC/crtm. This is the authoritative CRTM repository with a script to get the binary files from a UCAR hosted FTP site. We should work with the CRTM developers
THe original source code in the repository is no good for debugging if the compilation process, in particular cpp, changes it before the compiler sees it. Line numbers will not be the same in the stack trace and original source code. For a valid stack trace back to the real failing line of code, we need the final .files that the compiler actually sees. For a F90 repository file, this needs we need to preserve the .f90 made by cpp prior to compilation.
I agree we need to work with the authoritative crtm repository which now exists. It didn't in 2016 when I tried to make CRTM generally available on NOAA platforms and there was no authoritative site I could get to. At that time I worked from what NCO had already installed and reverse engineered it to work elsewhere. The idea that authoritative sites are inconsistent with what NCO wants, or connectivity to them is broken, comes from previous bitter experience with NOAA systems and yeah, we are a lot better now than we were in 2016
For hpc-stack a part of the crtm install should be getting the binary files from that site and working out any administrative barriers to doing so. There is also a reorganization of the directory structure from how the crtm repository expects the stuff to how NCO wants it also which was confusing on hera and orion when NCEPLIBS first took this on in 2018.
@GeorgeVandenberghe-NOAA
Good point. But, there is only 1 file that cpp
acts on and that is CRTM_Module.F90
and it enables a version number in the compiled library. So the argument that we need to have to save the build directory for crtm is for stack tracing is debunked.
There are also ESMF debuggers in our group. What do they do? Perhaps, the better solution would be to build CRTM with debug flags if they want to debug CRTM and place it along side the libcrtm.a
e.g. libcrtm.debug.a
or something like that.
One file is true for crtm. It is not for the general issue of installing the general software package. ESMF leaves its source code lying aroung post build so stack traces can point to it and I know UFS developers are exploiting that because they keep asking us for debug versions of the beta release of the week
A case can be made though, our dependency libraries should be reliable enough we don't need to follow stack traces back into the source code in the first place. My gripes about using beta dependency libraries are off topic.. they don't belong in hpc-stack in the first place!
@GeorgeVandenberghe-NOAA Build tree and source code do not belong in the central stack install location. For developers who want that level of access/scrutiny, should build their own version of the software and link against it. But that is my opinion, FWIW.
If the hpc-stack developers decide to remove library source code at hpc-stack, from the downstream user perspective, we would like to get guidance of accessing source code for the trouble-shooting the real-time issues.
We (Hang and I) maintain the source code (and logs) for each installation we do. There's no environment variable, but it's there if someone wants to look.
@WenMeng-NOAA Here are your instructions. The compiler module depends on the machine.
git clone https://github.com/noaa-emc/emc_crtm -b v2.3.0
mkdir build
cd build
module load compiler
cmake -DCMAKE_INSTALL_PREFIX=../install ../emc_crtm
make -j 6
make install
This is an interesting discussion. I want to highlight that we want generalized modules for libraries with relative paths. This is so that we can do lift and replace without breaking anything. I have a question as to why we are using emc_crtm as opposed to crtm if that is the authoritative repository. Is it an access issue?
"Lift and replace without breaking anything"
That ship has sailed. Modern software packages require absolute hard paths for configuration when they are being used and so must be reconfigured and reinstalled if moved. This is invisible to users but critical for package maintainers.
It has already bitten us hard on WCOSS2 where two filesystem moves required rebuilding of a large chunk of our stacks.
We are moving into discussions that are going a little off topic. I want to get back to the discussion at hand. The paths that are in the modules for the hpc-stack are relative for easy installation, so there are no paths for things that are not part of the installation. That is why CRTM_FIX and CRTM_SRC are not part of these module files.
@WenMeng-NOAA can you define these in your script level?
A little off topic but I would like to know if we should move to the crtm authoritative repository
CRTM_FIX could be installed and defined as part of hpc-stack. There's a script in the authoritative repository that does just that, but our fork doesn't include it.
In order for Wen to define CRTM_FIX she needs to know where it is.
Faced with this problem I would log in and start hunting for where NCO put it which defeats the purpose of our own stack.
Can we just go after the definitive crtm repository rather than our own when building CRTM for hpc-stack and include the fix files?
I will add an issue for going to definitive CTRM.
@kgerheiser Setting CRTM_FIX in crtm module would be helpful. If CRTM_SRC is not set, I would like to know the path of crtm source for trouble-shooting runtime issues. I usually use "module show" to find out library information.
The crtm repository script to get the fix files simply hangs on hera.
This kind of stuff is why we need to deal with this at the stack build level rather than having users deal with it and it's also why I maintain a stable tarball rather than trusting NOAA to support access to a repository in a stable reliable way. We have intermittent issues accessing hdf5 this way too.
The repository script to get the fix files works on Jet. At least this week :-( !!!!!
But assuming access is available, we should run this after compiling the library filename="fix_REL-2.4.0.tgz" #rel 2.4.0 files
if test -f "$filename"; then if [ -d "fix/" ]; then #fix directory exists echo "fix/ already exists, doing nothing." else
tar -zxvf $filename
mv fix_crtm-internal_develop fix
echo "fix/ directory created from existing $file name file."
fi
else
echo "downloading $filename, please wait about 5 minutes (3.2 GB tar file)"
wget -q ftp://ftp.ucar.edu/pub/cpaess/bjohns/$filename #jedi set of CRTM bin ary files
tar -zxvf $filename
mv fix_crtm-internal_develop fix
echo "fix/ directory created from downloaded $filename."
fi echo "Completed."
And modify the last few lines to put "fix" where we want it.
Also 2.4.0 is NOT our current level so do this for 2.3.1 which is
Point is we should do it once and set the environment variable pointing to it in our module.
Is this still an issue?
no. It's fine now.
On Tue, Feb 16, 2021 at 10:45 AM Rahul Mahajan notifications@github.com wrote:
Is this still an issue?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/119#issuecomment-779922994, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FTUMXNFPMUNBUKUGLDS7KHKZANCNFSM4UK2Z47A .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
closing.
It works in UPP building.
@WenMeng-NOAA reports: