MSO4SC / cloudify-hpc-plugin

Plugin to allow Cloudify to deploy and orchestrate HPC resources
Apache License 2.0
8 stars 8 forks source link

extend requirements for HPC: use of nix instead of modules for instance #16

Open Trophime opened 6 years ago

Trophime commented 6 years ago

So far the orchestrator plugin is designed for an HPC with modules and slurm. Would it be possible to extend it to other requirements like "nix" for instance?

emepetres commented 6 years ago

@Trophime what do you mean about "nix"? nixos? how would it work?

Trophime commented 6 years ago

At Grenoble Meso center they are using nix instead of modules:

NIX is a Linux distribution that allows local packages installation. It is independent from the hosts distribution and allows a simple user, with no privileges, to install packages of it's choice. It is specifically interesting in an HPC context where users need a lot of different libraries, sometimes of different versions. Nix also offers a way to develop into an isolated and reproducible environment. Those environment or resulting packages can be easily and efficiently shared between users.

Currently, NIX is only available on Froggy and Luke, but the goal is to have it installed on every Ciment clusters. Also note that it's very easy to use Nix on your own desktop computer, and that you'll find a very similar environment. Check this quickstart to setup Nix on your own Linux computer. It's also possible on OS X

see also: https: //gricad.github.io/calcul/nix/hpc/2017/05/15/nix-on-hpc-platforms.html

Trophime commented 6 years ago

here is an example of a script to launch calculation on 256 nodes with OAR and NIX:

#!/bin/bash
#OAR -n HL-31
##OAR -l /nodes=16/core=16
#OAR -l /nodes=16,walltime=2:00:00
#OAR -t devel
##OAR --stdout HL-31_%jobid%.out
##OAR --stderr HL-31_%jobid%.err
#OAR --project hpcfeelpp
#OAR --notify exec:/usr/local/bin/sendmail.sh

# Ensure Nix is loaded. The following line should be into your ~/.bashrc file.
source /applis/site/nix.sh

# Run the program
# Number of cores
nbcores=`cat $OAR_NODE_FILE|wc -l`
# Number of nodes
nbnodes=`cat $OAR_NODE_FILE|sort|uniq|wc -l`
#Name of the first node
firstnode=`head -1 $OAR_NODE_FILE`
#Number of cores allocated on the first node (it is the same on all the nodes)
pernode=`grep "$firstnode$" $OAR_NODE_FILE|wc -l`
echo "nbcores=" $nbcores
echo "nbnodes=" $nbnodes

HIFIMAGNET_APPSDIR=/scratch/trophime/feelpp_build/clang-3.7/research/hifimagnet/applications
mpirun -np 256 \
  -machinefile $OAR_NODEFILE -mca plm_rsh_agent "oarsh" \
  $HIFIMAGNET_APPSDIR/MagnetModels/feelpp_magnetmodels3DP1N1_linear_reg \
  --config-file HL-31-H1H8-Leads-air_singular_256_json.cfg
Trophime commented 6 years ago

this is typically the kind of script I would like to generate with the orchestrator to use on Meso center. Note that the Meso center in Strasbourg is also considering to move to Nix instead of Modules.

Oar is specific to Grenoble

Trophime commented 6 years ago

more info on Grenoble Meso center and nix could be found here

victorsndvg commented 6 years ago

Hi @Trophime ,

if I understood correctly, nix is an alternative to LMOD, right?

If this is true, I cannot understand how to use it from your example. Maybe I'm missing something. Can you extend the usage example? for example, a table comparing its usage against LMOD could be awesome.

On the other hand, you talk about Oar, that seems to be a resources manager. Can you extend the description about it? again, a table comparing against Slurm usage could be fantastic!

Trophime commented 6 years ago

for OAR I found this guide from your Luxembourg collegues.

Trophime commented 6 years ago

nix is indeed an alternative to LMOD. I will try to find some docs to illustrate the use of nix in HPC context There is an article from the IT guy in Grenoble.