NOAA-EMC / hpc-stack

Create a software stack for HPC's
GNU Lesser General Public License v2.1
30 stars 36 forks source link

NCO feedback on the hpc-stack installations on wcoss 2 #151

Closed Hang-Lei-NOAA closed 1 year ago

Hang-Lei-NOAA commented 3 years ago

Kyle and I have installed hpc-stack v1.1.0 on wcoss 2 for testing. Following are NCO requires to change:

Change the installation path like libs/hpc-stack/ips-18.0.1.163/ into libs/hpc-stack/ips/18.0.1.163/ the version should be separated from the compiler name for a complete vertical structure.

Default relative path instead of true path to make the hpc-stack installation relocable.

wgrib2: -We'd like to add a $WGRIB2 variable so that the executable can be accessed by invoking the variable (just like the grib_util executables); this would look like setenv("WGRIB2", pathJoin(base,"bin","wgrib2")) -The wgrib2 executable uses libjasper and libjpeg, so these modules should be prerequisites. In the case of WCOSS2, this will mean that "jasper" and "libjpeg" should be treated in the same way netcdf is in the wgrib2 modulefile.

*grib_util: In the modulefile, for each executable's setenv/pathJoin statement, the name of the executable should be in quotes, for example, setenv("COPYGB", pathJoin(base, "bin", copygb)) should be setenv("COPYGB", pathJoin(base, "bin", "copygb"))

I also have a question about how the default base path gets set in hpc-stack modules. The fallback in all the modulefiles is /opt/modules. Is there some way that we could set this to something else, like a variable that we can set when we build hpc-stack that will let us choose a different fallback directory? In our case, I think we would want local opt = os.getenv("HPC_OPT") or os.getenv("OPT") or "/opt/modules" to look something like local opt = os.getenv("HPC_OPT") or "/lfs/h1/ops/prod/libs/"

aerorahul commented 3 years ago

I don't understand the comment regarding "total vertical structure" Are they suggesting that the libraries installed under <prefix>/ips-18.0.1.163/ should be <prefix>/ips/18.0.1.163? If so, that can be done, but will need changes in all the modulefiles to adapt to this way. The software packages can be stored in many ways. We followed the recommendation from the LMod site here. The relevant bit is reproduced here:

The software packages themselves can be stored in many ways. For software packages, but not the modulefiles, we store them in another software hierarchy as follows:

Core packages: /opt/apps/pkgName/version Compiler dependent packages: /opt/apps/compilerName-version/pkgName/version MPI-Compiler dependent packages: /opt/apps/compilerName-version/mpiName-version/pkgName/version

The grib_util comment is actually a bug. Without comments, copygb is evaluated as a variable versus a string in "copygb". I believe it is fixed in develop here

During building of the software stack, the installation prefix is provided by the user via the -p prefix argument to setup_modules.sh and build_stack.sh.
The default is chosen to be /opt/modules. This is only relevant when using modules. And when modules are used, the first module one loads is module load hpc/1.0.0. This module sets the environment variable HPC_OPT and can be anything the user defines via the -p option. As seen, with the or, the local opt is set by looking for HPC_OPT and OPT environment variables before it falls back to /opt/modules. What is wrong with /opt/modules?

arunchawla-NOAA commented 3 years ago

has the issues highlighted by NCO been taken care of here?

arunchawla-NOAA commented 3 years ago

I really would like this taken care off as we do not ant a non hpc-stack solution on WCOSS2. Please make this a high priority

Hang-Lei-NOAA commented 3 years ago

We will discuss this at tomorrow's nceplibs meeting.

On Wed, Feb 17, 2021 at 3:31 PM arun chawla notifications@github.com wrote:

I really would like this taken care off as we do not ant a non hpc-stack solution on WCOSS2. Please make this a high priority

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/151#issuecomment-780834275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFCZ5WD7MHT3AW3NMULS7QRSRANCNFSM4XDRXPWQ .

edwardhartnett commented 3 years ago

We did not discuss this yet, and @Hang-Lei-NOAA and @kgerheiser are out this week, so we will get an update on Monday about this issue.

kgerheiser commented 3 years ago

There's not really a lot to do.

The grib_util thing is a bug that has been fixed.

Do we really have to add a WGRIB2 environment variable? It's an executable in your path WGRIB2=$(which wgrib2). Not every executable needs a variable.

Same thing for the installation structure. We could do it any number of ways, but what's the benefit of changing it? Rahul pointed out that this is the recommended way by LMod.

And I don't really understand the issue with the environment variable.

arunchawla-NOAA commented 3 years ago

NCO is not on GitHub yet. So I will move this discussion to an email thread

Hang-Lei-NOAA commented 3 years ago

My understanding the request on ENV variable is still about copy-and-work issue.

edwardhartnett commented 3 years ago

Also, even if NCO is not on GitHub, each comment here generates an email which they can reply to, which will get posted here. So even if they don't routinely use GitHub, we can still communicate via GitHub issue. They will see the emails as long as they have github accounts and are mentions.

aerorahul commented 1 year ago

Is this still relevant? Closing. Reopen if relevant.