ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 46 forks source link

handle cpu vs gpu nodes on gust/derecho #89

Open jedwards4b opened 1 year ago

jedwards4b commented 1 year ago

CPU nodes have 128 MAX_TASKS_PER_NODE while gpu nodes have 64. How do we handle each case independently and how do we handle the hybrid case?

sjsprecious commented 1 year ago

Could we add something like MAX_TASKS_PER_GPU_NODE and use it only when ngpus_per_node > 0?

fischer-ncar commented 1 year ago

Going along with @sjsprecious, we could change MAX_TASKS_PER_NODE to MAX_TASKS_PER_CPU_NODE.

jedwards4b commented 1 year ago

@fischer-ncar I think that your change would not be backward compatible and I think it may cause confusion. I think MAX_CPUTASKS_PER_GPU_NODE could be the solution.

sjsprecious commented 1 year ago

@jedwards4b so we will add a new XML variable MAX_CPUTASKS_PER_GPU_NODE and only use it when ngpus_per_node > 0?

jedwards4b commented 1 year ago

I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.

sjsprecious commented 1 year ago

I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.

Can you explain more about "hybrid CPU/GPU"? The GPU nodes are "hybrid" to some extent since there are CPU cores on the GPU nodes as well.

jedwards4b commented 1 year ago

Sure, only some components can currently use GPUs so in some cases we may want to run, for example, the atmosphere, on GPU nodes while the ocean component runs on CPU nodes. I have already successfully demonstrated this capability with a simple test program.

On Fri, Mar 17, 2023, 2:15 PM Jian Sun @.***> wrote:

I want to figure out how to run on GPU nodes but also hybrid CPU/GPU before I decide.

Can you explain more about "hybrid CPU/GPU"? The GPU nodes are "hybrid" to some extent since there are CPU cores on the GPU nodes as well.

— Reply to this email directly, view it on GitHub https://github.com/ESMCI/ccs_config_cesm/issues/89#issuecomment-1474361671, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOXUGC2PDVRBOXX475QAVTW4TA5BANCNFSM6AAAAAAV5WTQCI . You are receiving this because you were mentioned.Message ID: @.***>

amametjanov commented 1 year ago

In E3SM cime_config, runs on CPUs and GPUs are configured with cime compiler names like gnu vs gnugpu. E.g.: https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml#L263

gnugpu compiler adds GPU-specific compile flags. I hope this is the case-configuration that you want to enable.

rljacob commented 1 year ago

If the component has no GPU pieces, then it just compiles like its a cpu-only run. Then its all about getting the layout right.

jedwards4b commented 1 year ago

This is what I am trying to move away from. Thank you