Model crashes with an invalid floating-point instruction when ncpus is not a power of 2

alexanderbranca commented 6 months ago

Running the model with ncpus set to anything but a power of 2 (2,4,8,16,...) crashes the model:

Writing to /home/erroringons256/ExoPlaSim/mymodel_testrun/mymodel.cfg....

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x79d44ac23960 in ???
#1  0x79d44ac22ac5 in ???
#2  0x79d44a64251f in ???
        at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x5e2f29eaa2ca in ???
#4  0x5e2f29eac57c in ???
#5  0x5e2f29ea1a1d in ???
#6  0x5e2f29e4633f in ???
#7  0x5e2f29dd068a in ???
#8  0x79d44a629d8f in __libc_start_call_main
        at ../sysdeps/nptl/libc_start_call_main.h:58
#9  0x79d44a629e3f in __libc_start_main_impl
        at ../csu/libc-start.c:392
#10  0x5e2f29dd06d4 in ???
#11  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

System Specifications:

CPU: Xeon E5 2696 v2 x2 - 24c48t @ 2.5 GHz RAM: 64 GB DDR3 ECC @ 1333 MHz Motherboard: Supermicro X9DRH-7TF Operating System: Ubuntu 22.04.4 amd64

alexanderbranca commented 6 months ago

As an aside if it's fine to hijack the issues section for this, how does one set the command-line options for mpirun? I'd like to try with hyperthreading, however MPI seems to default to limiting to the amount of processor cores, not threads...

alphaparrot commented 6 months ago

The easiest way to add command-line options for mpirun (ExoPlaSim actually uses mpiexec but these are synonyms for each other when using OpenMPI) at the moment is probably to directly change mymodel._exec. You can add runtime flags to the end; just make sure you add a space at the end. Adding a pass-through (via a keyword argument mpi_opts) is a good idea, though, along with maybe a dedicated keyword argument for enabling hyperthreading. On machines with execution managers like PBS/TORQUE or SLURM, the execution manager will typically tell OpenMPI how many workers it can use (including with hyperthreading), but it's a good idea to make this easier to access for e.g. laptops and smaller clusters that don't have execution managers. I'm going to create a separate issue for this, for easier development task management.

You should generally only use a number of CPUs or workers/threads suited to the number of latitudes in the model. The maximum number of threads/workers (at the moment) is equal to the number of latitudes, and any number less than that must divide cleanly into the number of latitudes (i.e. each worker gets an equal amount of work). In the future I may add OpenMP parallelism in the radiation module that could take advantage of more threads than the number of latitudes, but at least half of the threads would not be in use except when the radiation step is running. That's fine for a personal laptop or other machine running without an execution manager, but in a cluster context where a job gets exclusive access to a fixed number of workers that would be unacceptably inefficient (unless said workers can be tasked with other work during other parts of the model runtime, which is vastly more complex). So if you're using T21 resolution (the model default, equivalent to 32 latitudes and 64 longitudes), you can run with 1, 2, 4, 8, 16, or 32 threads. T42 has 64 latitudes and thus can also be run on 64 cores. T63 has 96 latitudes and can therefore be used with additional non-power-of-2 configurations, but this is a slow-enough resolution that it's not really recommended for consumer-grade architectures or small CPU counts, and less of the model is validated at T63 and higher resolutions, so proceed with caution if using ExoPlaSim for scientific applications at high resolutions.

alexanderbranca commented 6 months ago

Oh yeah, thanks a lot, that actually fully explains why the model didn't work with powers of 2 past 32 either... Luckily I happen to be running this on a relatively beefy (somewhat older) 24 core system, even a year at T42 with 10 layers only seems to take about 10 minutes... (Admittedly I did cheat a bit by setting the timestep to 30 minutes rather than the default 15.)

alphaparrot / ExoPlaSim

Model crashes with an invalid floating-point instruction when ncpus is not a power of 2 #17