N8-CIR-Bede / documentation

Documentation for the N8CIR Bede Tier 2 HPC faciltiy
https://bede-documentation.readthedocs.io/en/latest/
7 stars 11 forks source link

Interactive jobs via `srun` #205

Open ptheywood opened 1 month ago

ptheywood commented 1 month ago

~It might be worth addding an FAQ entry as a resource to point to about the lack of (allocated) interactive jobs on gpu/infer/gh nodes, which are available on other systems. ~

~The current only mention of this a part Usage/login heading:~

~The login nodes are shared between all users of the service and therefore should only be used for light interactive work, for example: downloading and compiling software, editing files, preparing jobs and examining job output. Short test runs using their CPUs and GPUs are also acceptable.~

~A dedicated FAQ entry stating this might better match peoples search terms.~

bodgerer commented 1 month ago

Not sure I understand what you mean? You can use srun to obtain a job with cores, memory and gpu.

Cheers,

Mark -- Mark Dixon @.***> Tel: +44(0)191 33 41383 Advanced Research Computing (ARC), Durham University, UK

On Fri, 12 Jul 2024, Peter Heywood wrote:

[EXTERNAL EMAIL]

It might be worth addding an FAQ entry as a resource to point to about the lack of (allocated) interactive jobs on gpu/infer/gh nodes, which are available on other systems.

The current only mention of this a part Usage/loginhttps://bede-documentation.readthedocs.io/en/latest/usage/index.html#login heading:

The login nodes are shared between all users of the service and therefore should only be used for light interactive work, for example: downloading and compiling software, editing files, preparing jobs and examining job output. Short test runs using their CPUs and GPUs are also acceptable.

A dedicated FAQ entry stating this might better match peoples search terms.

— Reply to this email directly, view it on GitHubhttps://github.com/N8-CIR-Bede/documentation/issues/205, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIPCKGDOIZCCI5KSXCRX32DZL643VAVCNFSM6AAAAABKYX6MBWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYDKMZWGAYDGOI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

---360638104-1737945173-1720787206=:3974862--

ptheywood commented 1 month ago

I must have misremembered about it not being possible and there's no srun examples in the docs.

In which case adding an appropraite srun example to the usage page would be useful (with caveats about bath being better for utilsation / interactive jobs might not always be possible due to resource constraints).

bodgerer commented 1 month ago

If we're adding this to the documentation, there should be a note that there is a bug with cpu allocation with srun, at least on the POWER9 nodes.

The workaround is that you need to specify the number of cores, but only for srun. So, for 1 GPU from the gpu queue you need something like (not tested):

srun -c 32 --gres=gpu:1 --pty bash

If you ask for 2 gpus per node, you'll need "-c 64", etc.

It's similar on the infer nodes, except it's number of gpus times 40, not 32 (as the infer cpus have 160 smt threads & 4 gpus, vs 128 smt threads & 4 gpus on the gpu nodes)

Best,

Mark -- Mark Dixon @.***> Tel: +44(0)191 33 41383 Advanced Research Computing (ARC), Durham University, UK

On Fri, 12 Jul 2024, Peter Heywood wrote:

[EXTERNAL EMAIL]

I must have misremembered about it not being possible and there's no srun examples in the docs.

In which case adding an appropraite srun example to the usage page would be useful (with caveats about bath being better for utilsation / interactive jobs might not always be possible due to resource constraints).

— Reply to this email directly, view it on GitHubhttps://github.com/N8-CIR-Bede/documentation/issues/205#issuecomment-2225517495, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIPCKGEQJVDOQM5HW2DOTKTZL7GXZAVCNFSM6AAAAABKYX6MBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRVGUYTONBZGU. You are receiving this because you commented.Message ID: @.***>

---360638104-2121719147-1720789580=:3974862--