Closed dylex closed 6 years ago
I am excited to try it Dylan. Could you please provide a usage example of using GPU? My guess is adding the following switches:
-p gpu --gres=gpu:#
Question: does the # in -gres correspond to the number of GPU per task or number of GPU in total? Is the number of tasks limited by the total number of available GPUs?
Do you think this would work?
sbatch -n 16 -p ccb --qos ccb -c 5 -p gpu --gres=gpu:16 --exclusive --wrap'; %--ntasks-per-node 5 mybatch.sh
If you want to run on n nodes, with t tasks per node, each using c CPUs and 1 GPU (for a total of tc CPUs and t GPUs per node, or ntc total CPUs and nt total GPUs), you'd do:
sbatch -N$n -c$c --ntasks-per-node=$t --gres=gpu:$t -p gpu --wrap 'disBatch.py -g $taskfile'
Do not specify exclusive.
Excellent. Can I try it now or should I wait until Nick completes the review? Would it be okay to unload 'disBatch' module (v1.3) and add path to your version of disBatch.py?
If you'd like. Probably better not to use my version directly, in case I change things, but you can certainly clone this repo and run from there.
I get this error:
sbatch: error: Batch job submission failed: Requested node configuration is not available
when I ran the command below:
sbatch -N16 -c1 --ntasks-per-node=5 --gres=gpu:5 -p gpu --wrap 'disBatch.py -g /mnt/ceph/users/jjun/groundtruth_irc/bionet/bionet_static/irc_v4.2.6.disbatch'
I tried to install setup.py after cloning disBatch.py on the cluster but it gave me a permission error below:
jjun@workergpu05:disBatch$ python setup.py install running install running build running build_scripts creating build creating build/scripts-2.7 copying and adjusting disBatch.py -> build/scripts-2.7 changing mode of build/scripts-2.7/disBatch.py from 664 to 775 running install_scripts copying build/scripts-2.7/disBatch.py -> /usr/bin error: /usr/bin/disBatch.py: Permission denied
I made sure disBatch.py is called from the github clone:
jjun@workergpu05:src$ which disBatch.py ~/src/disBatch/disBatch.py
Any suggestion would be appreciated.
For install: the default install for python packages requires root. You probably want --user
, or just run directly out of the clone.
This is now specific to our cluster, so we should probably take it off-line, but FI doesn't have 16 nodes with 5 GPUs. See the cluster docs.
DS: Thanks for the modified updates.
Using generic support for splitting environment-specified resources across tasks.