arcs-njit-edu / Docs

Documentation for HPC at NJIT
0 stars 9 forks source link

Example script has different partition than docs #11

Open cadop opened 1 year ago

cadop commented 1 year ago

The example SLURM script here: https://arcs-njit-edu.github.io/Docs/Software/programming/python/conda/#install-tensorflow-with-gpu says --partition=datasci, but the docs don't list this as available partition https://arcs-njit-edu.github.io/Docs/Software/slurm/slurm/#using-slurm-on-cluster

alexpacheco commented 1 year ago

@cadop For now, the docs are written for Lochness as well as Wulver though we haven't made a clear distinction and we should.

@absrocks Can you make sure that all example scripts specify whether they are for Lochness or Wulver. Better yet, create example scripts for both so that when Lochness is decommissioned we can just delete the lochness scripts

cadop commented 1 year ago

The script:

#!/bin/bash -l
#SBATCH --job-name=tf_test
#SBATCH --output=%x.%j.out # %x.%j expands to JobName.JobID
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --partition=datasci
#SBATCH --gres=gpu:1
#SBATCH --mem=4G

# Purge any module loaded by default
module purge > /dev/null 2>&1
module load Anaconda3
source $HOME/conda.sh
conda activate tf
srun python tf.gpu.test.py

Also has #SBATCH --mem=4G, but support says it should be #SBATCH --mem-per-cpu=4G. If this is a lochness thing, it would be helpful to clarify as well.

absrocks commented 1 year ago

@cadop #SBATCH --mem-per-cpu=4G is applicable for Wulver, in Lochness you can use #SBATCH --mem=4G. This clarification is added in the new PR