Closed joehellmers closed 1 month ago
I can't seem to reproduce this. There might be a race condition somewhere or a filesystem issue, but when I run with 16 OpenMP threads, I can output the plotfile without issue.
Can you tell me what machine you ran on, how many MPI tasks and how many OpenMP threads?
oh, I just noticed you are running without MPI, just OpenMP. I tried that and I also have no problem outputting.
Ah, the problem was using mpirun instead of just running the executable. After fixing that the simulation works, but I'm not seeing any improvement in Wall-time for 1, 2, 4, 8 and 12 OpenMP threads. It's not a big deal for me right now, I'm just try to assess how Castro scales for an XSEDE allocation right now. MPI scales well.
it looks like we never tiled the MHD algorithm, since we were mostly interested running it on GPUs. I'll look at adding the tiling tonight.
Is that easy to do? Perhaps I could do it.
it may be as simple as adding
TilingIfNotGPU()
to the single MFIter
loop in Castro_mhd.cpp
. But I am not 100% certain.
OK. I'll setup a fork and give it a shot.
That seems to have helped quite a bit, although on my system scalability does seem to stall out with over 8 cores.
castro-sedovdev-1x1.omp.out:3727:Run time without initialization = 199.2570635
castro-sedovdev-1x2.omp.out:3727:Run time without initialization = 110.7894975
castro-sedovdev-1x4.omp.out:3727:Run time without initialization = 61.3167444
castro-sedovdev-1x8.omp.out:3727:Run time without initialization = 35.07138379
castro-sedovdev-1x12.omp.out:3727:Run time without initialization = 33.61117978
Should I create a pull request related to this issue, or should I make a new issue?
The way tiling works is that it divides a box up into logical tiles and then distributes those tiles across the OMP threads. If you are running the default inputs.3d.mhd
, then you have a single 32^3 box. The default tile size is 1024x8x8, so that would give you 16 tiles for OMP, which don't spread nicely over 12 cores. Not sure if your chip has 16 cores, but OMP might work better there.
you could try setting
fabarray.mfiter_tile_size = 1024 4 4
which would give you 64 tiles, but at some point it might simply be too small to scale effectively.
I tried the setting but it actually made it worse (43 seconds instead of 33). The node I'm running on only has 12 non-hyperthreaded cores, so maybe there is some sort of contention with other processes running on that nodes (i.e. Slurm, NFS client, etc.). I think the pull request for the fix is #2039.
closing this since I think the initial issue was resolved.
Hello,
Hello I'm getting this error when trying to run Sedov 3D MHD with OpenMP
I'm getting an error in the backtrace when trying to use OpenMP.
In Backtrace.0.0 I see this
The only thing I added to inputs.3d.mhd is
castro.max_subcycles = 16
My GNUmakefile is