N8-CIR-Bede / documentation

Documentation for the N8CIR Bede Tier 2 HPC faciltiy
https://bede-documentation.readthedocs.io/en/latest/
7 stars 11 forks source link

Scratch / Temp directories #114

Open ptheywood opened 2 years ago

ptheywood commented 2 years ago

Bede's usage documentation on filesystems currently only mentions the NFS/lustre mounts /projects, /nobackup and /users.

Making users aware of scratch / per node temporary storage can be benficial to job and NFS/lustre performance.

E.g. the Sheffield HPC documentation of this

It will be worth enquiring about this with the sysadmins to ensure this is meaningful / correct for bede.

On login nodes, $TMPDIR is not set, so /tmp is likely used. Within slurm jobs, $TMPDIR is set to /tmp.

/tmp on gpu nodes is a 1.6T raid1 partition split across the 2 SSDs in the node.

$SLURM_JOB_ID contains the integer mapping of the slurm job id, which can be used for per-job temp files combined with TMPDIR.

If the slurm configuration were changed, then using this would not cause issues (just extra nesting).

I.e. something vaguely like

# slurm options...
#$ ...

# Create the temp directory.
mkdir -p ${TMPDIR}/${SLURM_JOB_ID}
# Reset $TMPDIR to be more specifc
OLD_TMPDIR=${TMPDIR}
TMPDIR=${TMPDIR}/${SLURM_JOB_ID}

# Do stuff 

# Copy out results
mv ${TMPDIR}/path/to/source ${HOME}/path/to/dest

# Cleanup
rm -rf ${TMPDIR}
# Unset TMPDIR 
TMPDIR=${OLD_TMPDIR}