DedalusProject / dedalus

A flexible framework for solving PDEs with modern spectral methods.
http://dedalus-project.org/
GNU General Public License v3.0
489 stars 115 forks source link

LBVP gets stuck at the build_solver stage #284

Open csskene opened 6 months ago

csskene commented 6 months ago

Hi I've encountered an issue when trying to solve an LBVP to find the vector potential from a magnetic field. I've attached a code which shows the issue. By setting the resolution to be small the code gets stuck at the build solver phase when the number of processors is increased to 4. I've also attached a debug log which shows that it stalls at this step on processor 0

2024-02-29 13:35:17,943 transforms 0/4 DEBUG :: Building FFTW FFT plan for (dtype, gshape, axis) = (<class 'numpy.float64'>, (3, 12, 2, 6), 1)

and similar for the other processors 1->3. This resolution is way too small for this problem, but a similar error occurs at higher resolutions on a cluster. Best, Calum

B_lbvp.txt dedalus_p0.log dedalus_p1.log dedalus_p2.log dedalus_p3.log

bpbrown commented 3 months ago

@csskene I've downloaded your script, and can run it on 1, 2 or 3 cores but not 4 (like you described). This looks a lot like a racing condition to me, where some cores are not participating in a global operation.

In particular, if you add these lines:

rank = MPI.COMM_WORLD.rank
print(f"rank {rank:d}, g:{B['g'].shape:}, c:{B['c'].shape}")

and run on 4 cores, you'll see:

rank 3, g:(3, 12, 0, 6), c:(3, 0, 5, 6)
rank 0, g:(3, 12, 2, 6), c:(3, 2, 5, 6)
rank 2, g:(3, 12, 2, 6), c:(3, 2, 5, 6)
rank 1, g:(3, 12, 2, 6), c:(3, 2, 5, 6)

so rank 3 is missing shape in the grid (in theta) and in the coeffs (in the m's).

Let me look into this a bit more and get back to you.