Open ndangtt opened 4 months ago
Thank you so much!
Gonna try soon!
Useful slurm command:
List of current running jobs: squeue -u <username> -o="%.18i %.9P %.30j %.8u %.2t %.10M %.6D %R
Useful slurm command:
List of current running jobs:
squeue -u <username> -o="%.18i %.9P %.30j %.8u %.2t %.10M %.6D %R
@ndangtt Could you please specify a useful summary plot for the experiments? You were talking about having either hitting times or area under the curve plots across all experiment settings: Could you elaborate a bit more in writing?
Thanks!
@dimitri-rusin: sure! I've made a note here: https://github.com/dimitri-rusin/oll_onemax/issues/11
This is an example
.slurm
script I use for my job array. Assuming that I have a list of hundreds of command lines that can be run in parallel (i.e., they're independent of each others). All command lines are put in acmds.txt
file (each line is a command line). The following script makes use of slurm's job array to launch those commands in parallel (the scheduling is done by slurm), each command takes 1 core and maximum 15 hours.(in my experiments, each of those commands often corresponds to one RL agent training).
Important notes:
sc122-nguyen
with your project code:sc122-dimitri
(I currently allocate 40k CPU hours to yours, if you need more, please let me know).--exclusive
flag in this context. It's a slurm flag for reserving the whole compute node to each job so we'll be charged for the whole node for each job even though we only use 1 core.run.slurm
run.sh
Documentation of Cirrus can be found at: https://docs.cirrus.ac.uk/user-guide/introduction/