[Feature request] - Allow for more customization of benchmarks batch submission

JADE-V-V / JADE

JADE, a novel nuclear data libraries V&V tool

GNU General Public License v3.0

23 stars 8 forks source link

Is your feature request related to a problem? Please describe. If JADE is run in batch mode on a cluster, it is quite inconvenient to launch multiple benchmarks at one. This is because at the moment only one value of MPI tasks can be used for each JADE session (coming from the config file) and one number of nodes and time request (coming from the .sh template). In a normal production run, needs may be very different across different benchmark. For instance I may request only one hour on one node for ITER 1D, but 24 h on multiple nodes for C-model (or E-lite).

Describe the solution you'd like I suggest that we increase the level of customization for submitting benchmarks in batch mode. I would get rid of the single scheduler template and instead ask users that want to submit calculations in batch mode to provide a filled out scheduler template for each benchmark they want to run. This would be part of the single benchmark configuration. This would bring the following advantages:

full customization of run options for the different benchmarks (which have different CPU power needs)
easier maintenance since developers to not have to forsee all possible config parameters that a user may want to set or different formats of the schedulers. It will be users to customize their own schedulers according to their needs.
easier for users, since they can just copy paste a scheduler that they probably already have set up for other purposes.

A few env variables would still be controlled by the config file/jade session, such as the working directory.

Describe alternatives you've considered The alternative requires to keep extending the configuration file to allow users to customize more options. But the file contains already a large number of parameters and its maintenance would be painful in terms of doc and tests.

To complement on this. While using the new JADE batch submission system it happened many times for instance then when I was running the sphere benchmark I had a bunch of isotopes that completed in the allocated time while many others did not. I then built a simple bash script to go through all folders and if the mctal file was not found, run again the simulation (this requires the use of a "cleaning script" that removes runtpe and output files first (attached below). I guess this solution could be generalized. For instance, one could create only the inputs of the benchmarks to be run, and then run this single batch command which will execute all of them in series, maximizing in this way the usage of the HPC saldo.

the bash script

maindir="R:\AC_ResultsDB\Jade\04_JADE_latest_root\Tests\Simulations\33c\Sphere"

# Cycle in each directory and run MCNP
for dir in $(ls $maindir)
do
    path="$maindir/$dir/mcnp" # Build Path
    cd "$path" # Move to the dir
    flag=false
    for file in $(ls $path)
    do
        if [[ ${file: -2} == '_m' ]]; then
            flag=true
            break
        fi
    done

    if ! $flag; then
        mpirun /root/D1SUNED411_OpenMP/d1suned411_openmp i=${dir}_ n=${dir}_ xs=root/MCNP/MCNP_DATA/xsdir_mcnp6_test1_fendl32c tasks 48 
    fi

python cleaning script

import os
root = r'R:\AC_ResultsDB\Jade\04_JADE_latest_root\Tests\Simulations\33c\TIARA-BC'

for folder in os.listdir(root):
    cp = os.path.join(root, folder, 'mcnp')
    flag_remove = True
    for file in os.listdir(cp):
        if file.endswith('m'):
            flag_remove = False

    if flag_remove:
        print(folder)
        for file in os.listdir(cp):
            if file[:-1] != folder:
                filepath = os.path.join(root, folder, 'mcnp', file)
                os.remove(filepath)

JADE-V-V / JADE

[Feature request] - Allow for more customization of benchmarks batch submission #245

the bash script

python cleaning script