EESSI / software-layer

Software layer of the EESSI project
https://eessi.github.io/docs/software_layer
GNU General Public License v2.0
20 stars 43 forks source link

provide module file to set up EESSI environment #68

Open boegel opened 3 years ago

boegel commented 3 years ago

Idea by @akesandgren, a module file that sites can install on their system to provide easy access to EESSI.

Instead of init script to be sourced.

Reasons:

@akesandgren Anything to add?

akesandgren commented 3 years ago

That's the basic ideas for it at least. Can't currently come up with anything else.

Just being able to hop between SW-stacks, EESSI, CC, Site-stuff, would be nice and even somewhat fancy.

akesandgren commented 3 years ago

here is a sketch of how to do things....

local posix = require("posix")
require("capture")

whatis([[Description: EESSI Software environment]])

help([[
Sets up the environment to use the EESSI software stack.
]])

conflict("EESSI")

local eessi_pilot_version = "2020.12"
local eessi_prefix = "/cvmfs/pilot.eessi-hpc.org/"..eessi_pilot_version

local osname           = posix.uname("%s")
if osname == "Linux" then
    setenv("EESSI_OS_TYPE", "linux")
else
    setenv("EESSI_OS_TYPE", "macos")
end
setenv("EESSI_CPU_FAMILY", posix.uname("%m"))

local eprefix = eessi_prefix.."/compat/"..os.getenv("EESSI_OS_TYPE").."/"..os.getenv("EESSI_CPU_FAMILY")
setenv("EPREFIX", eprefix)

prepend_path("PATH", eprefix.."/usr/bin")

local eessi_eprefix = eprefix
setenv("EESSI_EPREFIX", eessi_eprefix)

setenv("EESSI_EPREFIX_PYTHON", eessi_eprefix.."/usr/bin/python3")
local eessi_software_subdir_for_host=capture(os.getenv("EESSI_EPREFIX_PYTHON").." "..eessi_prefix.."/init/eessi_software_subdir_for_host.py "..eessi_prefix):gsub("%s+","")
setenv("EESSI_SOFTWARE_SUBDIR", eessi_software_subdir_for_host)

local eessi_sw_path = eessi_prefix.."/software/"..os.getenv("EESSI_SOFTWARE_SUBDIR")
setenv("EESSI_SOFTWARE_PATH", eessi_sw_path)

local eessi_mod_subdir = "modules/all"

setenv("EESSI_MODULEPATH", eessi_sw_path.."/"..eessi_mod_subdir)

setenv("LMOD_RC", eessi_sw_path.."/.lmod/lmodrc.lua")

-- Here comes the corresponding "source $EESSI_EPREFIX/usr/lmod/*/init/bash"

prepend_path("MODULEPATH", os.getenv("EESSI_MODULEPATH"))

The only "difficult" part that remains is probably the "source $EESSI_EPREFIX/usr/lmod/*/init/bash" from the original init/bash

Seem to work on my laptop, as far as I get ml av to show the EESSI modules from the correct tree.

klust commented 3 years ago

Here is my current take on a module. Currently it requires the system manager to set a few things by hand (basically the OS, though I don't think it is anywhere ready for macOS) and the processor as would be retuned by archspec cpu.

local eessi_version = "2020.12"
local eessi_root = "/cvmfs/pilot.eessi-hpc.org/"

-- Discovering which software directories to use:
-- - Option 1: Detect the cpu architecture as returned by archspec cpu 
--   and use a mapping in the module file to map that onto the triple 
--   for the host type. That mapping table could always be generated
--   offline during the building of a new EESSI stack.
-- - Option 1bis: Simply ask system managers who want to provide EESSI
--   through a module to set a certain environment variable which this
--   module then picks up. This can avoid calling external programs
--   that may or may not slow down the loading and unloading of the
--   module.
-- - Option 2:
--     + First discover the CPU family which is enough to intialize
--       the correct version of the compatibility layer. (uname -m)
--     * Then use the compatibilty layer to run the Python script 
--       that determines the version of the software layer.
-- When EESSI starts supporting macos we also need to add the code
-- to detect the OS or detect the OS by testing some typical environment
-- variables or directories, e.g., assume that a system that has
-- /System/Library/AppleUSBDevice is a Mac and otherwise it's a Linux
-- machine.

-- Now if we could get the info in the following line elsewhere...
local archspec_cpu = 'skylake'
-- Alternatively, it might be enough if we could only determine the CPU family
-- as that is enough to initialize the compatibility layer. We may then try to
-- use that one to discover further properties.
-- Having lscpu and awk or grep and sed might be enough.

-- Table mapping possible values for arcspec_cpu onto the host triple 
-- (family, vendor, arch) where vendor is omitted in some cases.
-- The followind can't be a local variable. I'm not sure how dangerous that
-- is for Lmod.
arch_mapping = {
    x86_64 =         { family = 'x86_64',                    arch = 'generic' },
    nocona =         { family = 'x86_64',                    arch = 'generic' },
    core2 =          { family = 'x86_64',                    arch = 'generic' },
    nehalem =        { family = 'x86_64',                    arch = 'generic' },
    westmere =       { family = 'x86_64',                    arch = 'generic' },
    sandybridge =    { family = 'x86_64',                    arch = 'generic' },
    ivybridge =      { family = 'x86_64',                    arch = 'generic' },
    hasswell =       { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    broadwell =      { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    skylake =        { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    skylake_avx512 = { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
    cascadelake =    { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
    icelake =        { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
    bulldozer =      { family = 'x86_64',                    arch = 'generic' },
    piledriver =     { family = 'x86_64',                    arch = 'generic' },
    steamroller =    { family = 'x86_64',                    arch = 'generic' },
    excavator =      { family = 'x86_64',                    arch = 'generic' },
    zen =            { family = 'x86_64',                    arch = 'generic' },
    zen2 =           { family = 'x86_64',  vendor = 'amd',   arch = 'zen2' },
    zen3 =           { family = 'x86_64',  vendor = 'amd',   arch = 'zen3' },
    power9le =       { family = 'ppc64le',                   arch = 'power9le' },
    aarch64 =        { family = 'aarch64',                   arch = 'generic' },
    thunderx2 =      { family = 'aarch64',                   arch = 'thunderx2' },
    a64fx =          { family = 'aarch64',                   arch = 'a64fx' },
    graviton =       { family = 'aarch64',                   arch = 'generic' },
    graviton2 =      { family = 'aarch64',                   arch = 'graviton2' }
}

local eessi_os_type =    "linux"
local eessi_cpu_family = arch_mapping[archspec_cpu].family
local eessi_cpu_vendor = arch_mapping[archspec_cpu].vendor
local eessi_arch =       arch_mapping[archspec_cpu].arch

local helpstring = string.gsub( [[
This module enables the EESSI EESSI_VERSION pilot. It is not needed to 
execute the initialisation scripts after loading this module.
]], "EESSI_VERSION", eessi_version )
help( helpstring )

local whatisstring = string.gsub(
"Enables the EESSI EESSI_VERSION pilot.",
"EESSI_VERSION", eessi_version )
whatis( whatisstring )

family( "BaseSoftwwareStack" )

-- The problem with overwriting PS1 is that the variable is not
-- correctly restored when unloading the module and no prompt appears.
-- It may be due to a bug in Lmod because pushenv should restore the value
-- when unloading the module.
-- pushenv( "PS1", "[EESSI pilot " .. eessi_version .. "] $ " )

setenv( "EESSI_PILOT_VERSION",   eessi_version )
setenv( "EESSI_PREFIX",          pathJoin( eessi_root, eessi_version ) )
setenv( "EESSI_OS_TYPE",         eessi_os_type )
setenv( "EESSI_CPU_FAMILY",      eessi_cpu_family )

-- Set EPREFIX since that is basically a standard in Gentoo Prefix
local eprefix = pathJoin( eessi_root, eessi_version, 'compat', eessi_os_type, eessi_cpu_family )
if not isDir( eprefix ) then
    LmodError( 'EESSI compatibility layer at ' .. eprefix .. ' not found. Maybe check the CVMFS mounts?' )
end
setenv( "EPREFIX",               eprefix )
prepend_path( "PATH",            pathJoin( eprefix, "/usr/bin" ) )
setenv( "EESSI_EPREFIX",         eprefix )
setenv( "EESSI_EPREFIX_PYTHON",  pathJoin( eprefix, "/usr/bin/python" ) )

-- TODO
local detect_command = pathJoin( eprefix, "/usr/bin/python" ) .. ' ' ..
                       pathJoin( eprefix, '}/init/eessi_software_subdir_for_host.py' ) .. ' ' ..  
                       pathJoin( eessi_root, eessi_version )
LmodMessage( 'TODO: Can we now run ' .. detect_command .. ' and capture the output instead of using the full mapping?')

-- Set variables for the software layer.
local eessi_software_subdir
if ( eessi_cpu_vendor == nil ) then
    eessi_software_subdir = pathJoin( eessi_cpu_family, eessi_arch )
else
    eessi_software_subdir = pathJoin( eessi_cpu_family, eessi_cpu_vendor, eessi_arch )
end
local eessi_software_path = eessi_root .. eessi_version .. "/software/" .. eessi_software_subdir
setenv( "EESSI_SOFTWARE_SUBDIR", eessi_software_subdir )
setenv( "EESSI_SOFTWARE_PATH",   eessi_software_path )

setenv( "EESSI_MODULE_PATH",     pathJoin( eessi_software_path, "/modules/all" ) )
prepend_path( "MODULEPATH",      pathJoin( eessi_software_path, "/modules/all" ) )
-- Set LMOD_RC. This may be a problem if it is already set in the system for other reasons!
-- We use pushenv to ensure that it is set back to the original value when unloading the module.
pushenv( "LMOD_RC",               pathJoin( eessi_software_path, "/.lmod/lmodrc.lua" ) )

I've put some remarks about possible ways to proceed with this in the code. This is basically something I wrote rather quickly to test EESSI on Fedora on top of WSL2/Windows 10, and Kenneth liked the idea which challenged me to make it a bit more general already.

klust commented 3 years ago

I've continued working on the idea. The result is in my GitHub repositoriy MiscExperiments. The README file linked to explains the concepts that I've tried. I made three variants, one using only functions that I found in the Lmod manual and needs an additional environment variable, one that uses a function that I did not find in the Lmod manual but is allowed in the Lmod Lua sandbox (at least in Lmod 8.4.1 which I tried), and can use an additional environment variable to give you an optimized software stack, and one that calls the Python script from EESSI to determine the best version of the software layer. The latter could be done using functionality provided by Lmod. The commit that I just tested is commit c6b20c8 . Later commits may or may not work as I do sometimes commits before things are fully tested.

wpoely86 commented 3 years ago

I've given this idea also a spin using @klust his files as input. It works really nicely. I use pushenv for the MODULEPATH so you can't mix local and EESSI modules (Lmod even handles reloads if you have loaded a default version or the same version exists in EESSI and locally).

I think this is a really nice way of providing EESSI on HPC cluster where we control the environment. The archspec part is not needed as I know where it's running and what hardware it is (we already have variables to determine the local module paths). This also makes the module file very fast as we don't have to do any autodetect, just set a bunch of variables.

I would keep the bash script as it's infinite more flexible and really well suited for unknown environments but also provide a template module file for HPC sites. Every site can then customize as they see fit (or just use the bash script).

wpoely86 commented 3 years ago

For reference, I've used the module file below. It's tailored to our HPC cluster so you have to adjust at least two things:

local eessi_version = myModuleVersion()
local eessi_root = "/cvmfs/pilot.eessi-hpc.org/"

local helpstring = string.gsub([[
This module enables the EESSI EESSI_VERSION pilot. It is not needed to 
execute the initialisation scripts after loading this module.
]], "EESSI_VERSION", eessi_version)
help(helpstring)

whatis('Enables the EESSI ' .. eessi_version .. 'pilot.')

family("eessi")
add_property("lmod", "sticky")

-- manual defined mapping for now
arch_mapping = {
    x86_64 =         { family = 'x86_64',                    arch = 'generic' },
    ivybridge =      { family = 'x86_64',                    arch = 'generic' },
    haswell =       { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    broadwell =      { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    skylake =        { family = 'x86_64',  vendor = 'intel', arch = 'haswell' },
    skylake_avx512 = { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
}

local archspec_cpu = os.getenv("VSC_ARCH_LOCAL") or "x86_64"

local eessi_os_type =    "linux"
local eessi_cpu_family = arch_mapping[archspec_cpu].family
local eessi_cpu_vendor = arch_mapping[archspec_cpu].vendor
local eessi_arch =       arch_mapping[archspec_cpu].arch

setenv("EESSI_PILOT_VERSION",   eessi_version)
setenv("EESSI_PREFIX",          pathJoin(eessi_root, eessi_version))
setenv("EESSI_OS_TYPE",         eessi_os_type)
setenv("EESSI_CPU_FAMILY",      eessi_cpu_family)

-- Set EPREFIX since that is basically a standard in Gentoo Prefix
local eprefix = pathJoin(eessi_root, eessi_version, 'compat', eessi_os_type, eessi_cpu_family)
setenv("EPREFIX",               eprefix)
prepend_path("PATH",            pathJoin(eprefix, "/usr/bin"))
setenv("EESSI_EPREFIX",         eprefix)
setenv("EESSI_EPREFIX_PYTHON",  pathJoin(eprefix, "/usr/bin/python3"))

-- Set variables for the software layer.
local eessi_software_subdir
if (eessi_cpu_vendor == nil) then
    eessi_software_subdir = pathJoin(eessi_cpu_family, eessi_arch)
else
    eessi_software_subdir = pathJoin(eessi_cpu_family, eessi_cpu_vendor, eessi_arch)
end

local eessi_software_path = pathJoin(eessi_root, eessi_version, "software", eessi_os_type, eessi_software_subdir)
local eessi_module_subdir = "modules/all"
setenv("EESSI_SOFTWARE_SUBDIR", eessi_software_subdir)
setenv("EESSI_SOFTWARE_PATH", eessi_software_path)
setenv("EESSI_MODULE_SUBDIR", eessi_module_subdir)
setenv("EESSI_MODULE_PATH", pathJoin(eessi_software_path, eessi_module_subdir))

-- overwrite the current module path to avoid mixing modules
-- first pushenv the default module path, then add the eessi part
pushenv("MODULEPATH", '/etc/modulefiles/vsc')
prepend_path("MODULEPATH", pathJoin(eessi_software_path, eessi_module_subdir))

-- For the Lmod cache of the the EESSI stack.
pushenv("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua"))
wpoely86 commented 3 years ago

Small caveats of using EESSI as a module like above: every run of spider will trigger a mount of cvmfs as Lmod will 'dive' into the MODULEPATHs set

bedroge commented 2 weeks ago

Small caveats of using EESSI as a module like above: every run of spider will trigger a mount of cvmfs as Lmod will 'dive' into the MODULEPATHs set

Related to this, I also saw weird issues with Lmod regenerating the EESSI cache whenever we would update the cache of our local stack (which has a module file to provide access to EESSI), see https://github.com/TACC/Lmod/issues/708. This can be prevented using the approach from https://lmod.readthedocs.io/en/latest/350_community.html, i.e. by using:

if ( mode() ~= "spider" ) then
    prepend_path("MODULEPATH", ......)
end
ocaisa commented 2 weeks ago

That approach also requires you to update the LMOD_RC variable so that it finds the spider cach of EESSI

bedroge commented 2 weeks ago

We use the following module file to provide access to the EESSI production repo (version 2023.06):

local eessi_version = myModuleVersion()
local eessi_root = "/cvmfs/software.eessi.io/versions/"

local helpstring = string.gsub([[
This module enables the EESSI EESSI_VERSION software stack.
It is not needed to execute the initialisation scripts after loading this module.
For more information about EESSI, see https://eessi.io.
]], "EESSI_VERSION", eessi_version)
help(helpstring)

whatis('Enables the EESSI ' .. eessi_version .. ' software stack.')

family("EESSI")
add_property("lmod", "sticky")

-- manual defined mapping for now
arch_mapping = {
    ['x86_64/amd/zen3'] =             { family = 'x86_64',  vendor = 'amd',   arch = 'zen3' },
    ['x86_64/intel/icelake'] =        { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
    ['x86_64/intel/skylake_avx512'] = { family = 'x86_64',  vendor = 'intel', arch = 'skylake_avx512' },
}

function trim (s)
    return (string.gsub(s, "^%s*(.-)%s*$", "%1"))
end

local archspec_cpu = trim(capture("/cvmfs/hpc.rug.nl/tools/eessi_archdetect.sh cpupath 2> /dev/null"))

local eessi_os_type =    "linux"
local eessi_cpu_family = arch_mapping[archspec_cpu].family
local eessi_cpu_vendor = arch_mapping[archspec_cpu].vendor
local eessi_arch =       arch_mapping[archspec_cpu].arch

setenv("EESSI_VERSION",    eessi_version)
setenv("EESSI_PREFIX",     pathJoin(eessi_root, eessi_version))
setenv("EESSI_OS_TYPE",    eessi_os_type)
setenv("EESSI_CPU_FAMILY", eessi_cpu_family)

-- Set EPREFIX since that is basically a standard in Gentoo Prefix
local eprefix = pathJoin(eessi_root, eessi_version, 'compat', eessi_os_type, eessi_cpu_family)
setenv("EPREFIX",               eprefix)
prepend_path("PATH",            pathJoin(eprefix, "/usr/bin"))
setenv("EESSI_EPREFIX",         eprefix)
setenv("EESSI_EPREFIX_PYTHON",  pathJoin(eprefix, "/usr/bin/python3"))

-- Set variables for the software layer.
local eessi_software_subdir
if (eessi_cpu_vendor == nil) then
    eessi_software_subdir = pathJoin(eessi_cpu_family, eessi_arch)
else
    eessi_software_subdir = pathJoin(eessi_cpu_family, eessi_cpu_vendor, eessi_arch)
end

local eessi_software_path = pathJoin(eessi_root, eessi_version, "software", eessi_os_type, eessi_software_subdir)
local eessi_module_subdir = "modules"
setenv("EESSI_SOFTWARE_SUBDIR", eessi_software_subdir)
setenv("EESSI_SOFTWARE_PATH", eessi_software_path)
setenv("EESSI_MODULE_SUBDIR", eessi_module_subdir)
setenv("EESSI_MODULE_PATH", pathJoin(eessi_software_path, eessi_module_subdir))

-- overwrite the current module path to avoid mixing modules
-- first pushenv the default module path, then add the eessi part
-- pushenv("MODULEPATH", '/cvmfs/hpc.rug.nl/versions/modules')
-- prepend_path("MODULEPATH", pathJoin(eessi_software_path, eessi_module_subdir))
local classes = {
    "ai", "astro", "bio", "cae", "chem", "compiler", "data", "debugger", "devel", "geo", "ide", "lang",
    "lib", "math", "mpi", "numlib", "perf", "phys", "quantum", "system", "toolchain", "tools", "vis"
}
if ( mode() ~= "spider" ) then
    for i = #classes, 1, -1 do
        prepend_path("MODULEPATH", pathJoin(eessi_software_path, eessi_module_subdir, classes[i]))
    end
end

-- Set path to the Lmod cache of the the EESSI stack.
--pushenv("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua"))
prepend_path("LMOD_RC", pathJoin(eessi_software_path, "/.lmod/lmodrc.lua"))

if (mode() == "load") then
    LmodMessage("Loaded version " .. eessi_version .. " of the EESSI software stack.")
    LmodMessage("For more information, see: https://eessi.io/docs/")
end

Note that we use the module classes (as we do the same for our local stack), and it's using archdetect instead of archspec. Also, it's using prepend_path for LMOD_RC, which allows you to have multiple RC files, and hence multiple (levels of) caches.