CliMA / ClimaAtmos.jl

ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
Apache License 2.0
85 stars 18 forks source link

Compile scaling results for `ClimaAtmos.jl` on `Derecho` supercomputer #3161

Open sriharshakandala opened 4 months ago

sriharshakandala commented 4 months ago

Compile scaling results for ClimaAtmos.jl on Derecho supercomputer

Sbozzolo commented 4 months ago

Here's more things that we could explore to mprove scaling:

https://ncar-hpc-docs.readthedocs.io/en/latest/compute-systems/derecho/starting-derecho-jobs/derecho-job-script-examples-content/?h=set_gpu

# (Optional: Enable GPU managed memory if required.)
#   From ‘man mpi’: This setting will allow MPI to properly
#   handle unify memory addresses. This setting has performance
#   penalties as MPICH will perform buffer query on each buffer
#   that is handled by MPI)
# If you see runtime errors like
# (GTL DEBUG: 0) cuIpcGetMemHandle: invalid argument,
#  CUDA_ERROR_INVALID_VALUE
# make sure this variable is set
export MPICH_GPU_MANAGED_MEMORY_SUPPORT_ENABLED=1

Maybe with this we don't need JULIA_MEMORY_POOL="none" which would allow us to use the new allocator

From the same page:

Binding MPI ranks to CPU cores can also be an important performance consideration for GPU-enabled codes, and can be done with the --cpu-bind option to mpiexec. For the above example using 2 nodes, 4 MPI ranks per node, and 1 GPU per MPI rank, binding each of the MPI ranks to one of the four separate NUMA domains within a node is likely to be optimal for performance. This could be done as follows:

mpiexec -n 8 -ppn 4 --cpu-bind verbose,list:0:16:32:48 ./set_gpu_rank ./executable_name