ai4d-iasc / trixie

Scripts and documentation about trixie hpc
17 stars 3 forks source link

max node size on trixie #54

Closed kryczko closed 3 years ago

kryczko commented 3 years ago

What is the maximum node size when running jobs on TrixieMain and why is it being limited?

fieldsa commented 3 years ago

The maximum node size on partition TrixieMain is the highest number of compute nodes possible to be allocated to a single job during job submission to SLURM queue.

kryczko commented 3 years ago

Right, okay. I normally don't train across that many nodes, because models tend to train poorly with larger batch sizes (even with adaptive learning rates for layers). I do run inference though, and if my system is large enough I could use more than 15 nodes, but I would not require that much time.