Closed alexanderbranca closed 5 months ago
As an aside if it's fine to hijack the issues section for this, how does one set the command-line options for mpirun? I'd like to try with hyperthreading, however MPI seems to default to limiting to the amount of processor cores, not threads...
The easiest way to add command-line options for mpirun (ExoPlaSim actually uses mpiexec but these are synonyms for each other when using OpenMPI) at the moment is probably to directly change mymodel._exec
. You can add runtime flags to the end; just make sure you add a space at the end. Adding a pass-through (via a keyword argument mpi_opts) is a good idea, though, along with maybe a dedicated keyword argument for enabling hyperthreading. On machines with execution managers like PBS/TORQUE or SLURM, the execution manager will typically tell OpenMPI how many workers it can use (including with hyperthreading), but it's a good idea to make this easier to access for e.g. laptops and smaller clusters that don't have execution managers. I'm going to create a separate issue for this, for easier development task management.
You should generally only use a number of CPUs or workers/threads suited to the number of latitudes in the model. The maximum number of threads/workers (at the moment) is equal to the number of latitudes, and any number less than that must divide cleanly into the number of latitudes (i.e. each worker gets an equal amount of work). In the future I may add OpenMP parallelism in the radiation module that could take advantage of more threads than the number of latitudes, but at least half of the threads would not be in use except when the radiation step is running. That's fine for a personal laptop or other machine running without an execution manager, but in a cluster context where a job gets exclusive access to a fixed number of workers that would be unacceptably inefficient (unless said workers can be tasked with other work during other parts of the model runtime, which is vastly more complex). So if you're using T21 resolution (the model default, equivalent to 32 latitudes and 64 longitudes), you can run with 1, 2, 4, 8, 16, or 32 threads. T42 has 64 latitudes and thus can also be run on 64 cores. T63 has 96 latitudes and can therefore be used with additional non-power-of-2 configurations, but this is a slow-enough resolution that it's not really recommended for consumer-grade architectures or small CPU counts, and less of the model is validated at T63 and higher resolutions, so proceed with caution if using ExoPlaSim for scientific applications at high resolutions.
Oh yeah, thanks a lot, that actually fully explains why the model didn't work with powers of 2 past 32 either... Luckily I happen to be running this on a relatively beefy (somewhat older) 24 core system, even a year at T42 with 10 layers only seems to take about 10 minutes... (Admittedly I did cheat a bit by setting the timestep to 30 minutes rather than the default 15.)
Running the model with ncpus set to anything but a power of 2 (2,4,8,16,...) crashes the model:
System Specifications:
CPU: Xeon E5 2696 v2 x2 - 24c48t @ 2.5 GHz RAM: 64 GB DDR3 ECC @ 1333 MHz Motherboard: Supermicro X9DRH-7TF Operating System: Ubuntu 22.04.4 amd64