Open jedwards4b opened 1 year ago
Could we add something like MAX_TASKS_PER_GPU_NODE
and use it only when ngpus_per_node > 0
?
Going along with @sjsprecious, we could change MAX_TASKS_PER_NODE to MAX_TASKS_PER_CPU_NODE.
@fischer-ncar I think that your change would not be backward compatible and I think it may cause confusion. I think MAX_CPUTASKS_PER_GPU_NODE could be the solution.
@jedwards4b so we will add a new XML variable MAX_CPUTASKS_PER_GPU_NODE
and only use it when ngpus_per_node > 0
?
I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.
I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.
Can you explain more about "hybrid CPU/GPU"? The GPU nodes are "hybrid" to some extent since there are CPU cores on the GPU nodes as well.
Sure, only some components can currently use GPUs so in some cases we may want to run, for example, the atmosphere, on GPU nodes while the ocean component runs on CPU nodes. I have already successfully demonstrated this capability with a simple test program.
On Fri, Mar 17, 2023, 2:15 PM Jian Sun @.***> wrote:
I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.
Can you explain more about "hybrid CPU/GPU"? The GPU nodes are "hybrid" to some extent since there are CPU cores on the GPU nodes as well.
— Reply to this email directly, view it on GitHub https://github.com/ESMCI/ccs_config_cesm/issues/89#issuecomment-1474361671, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGC2PDVRBOXX475QAVTW4TA5BANCNFSM6AAAAAAV5WTQCI . You are receiving this because you were mentioned.Message ID: @.***>
In E3SM cime_config, runs on CPUs and GPUs are configured with cime compiler names like gnu vs gnugpu. E.g.: https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml#L263
gnugpu compiler adds GPU-specific compile flags. I hope this is the case-configuration that you want to enable.
If the component has no GPU pieces, then it just compiles like its a cpu-only run. Then its all about getting the layout right.
This is what I am trying to move away from. Thank you
CPU nodes have 128 MAX_TASKS_PER_NODE while gpu nodes have 64. How do we handle each case independently and how do we handle the hybrid case?