Open cadop opened 1 year ago
@cadop For now, the docs are written for Lochness as well as Wulver though we haven't made a clear distinction and we should.
@absrocks Can you make sure that all example scripts specify whether they are for Lochness or Wulver. Better yet, create example scripts for both so that when Lochness is decommissioned we can just delete the lochness scripts
The script:
#!/bin/bash -l
#SBATCH --job-name=tf_test
#SBATCH --output=%x.%j.out # %x.%j expands to JobName.JobID
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --partition=datasci
#SBATCH --gres=gpu:1
#SBATCH --mem=4G
# Purge any module loaded by default
module purge > /dev/null 2>&1
module load Anaconda3
source $HOME/conda.sh
conda activate tf
srun python tf.gpu.test.py
Also has #SBATCH --mem=4G
, but support says it should be #SBATCH --mem-per-cpu=4G
. If this is a lochness thing, it would be helpful to clarify as well.
@cadop #SBATCH --mem-per-cpu=4G
is applicable for Wulver, in Lochness you can use #SBATCH --mem=4G
. This clarification is added in the new PR
The example SLURM script here: https://arcs-njit-edu.github.io/Docs/Software/programming/python/conda/#install-tensorflow-with-gpu says
--partition=datasci
, but the docs don't list this as available partition https://arcs-njit-edu.github.io/Docs/Software/slurm/slurm/#using-slurm-on-cluster