Open drwells opened 1 year ago
Is there an accumulate version of the receive operation? This is what I forgot to handle in my initial version of #10.
performLocalCopies()
should be the same as doing packStream()
/unpackStream()
but it isn't - perhaps that's what is missing. I'll take a look.
The marker point tests (with this patch) get completely wrong values:
< X = 0.0147398, 0.081608 V = 0.542664, 0.220586
< X = 0.0147398, 0.193867 V = 0.542664, 0.180123
< X = 0.0147398, 0.306125 V = 0.542664, 0.139661
< X = 0.0147398, 0.418384 V = 0.542664, 0.0991993
< X = 0.0288938, 0.515868 V = 0.194435, -0.00646765
< X = 0.0147398, 0.642901 V = 0.542664, 0.0182751
< X = 0.0147398, 0.75516 V = 0.542664, -0.022187
< X = 0.0147398, 0.867418 V = 0.542664, -0.0626491
< X = 0.019355, 0.974035 V = 0.396029, -0.182672
---
> X = 0.0325043, 0.0784706 V = 0.138241, 0.210765
> X = 0.0325043, 0.185667 V = 0.138241, 0.157167
> X = 0.0325043, 0.292863 V = 0.138241, 0.103569
> X = 0.0325043, 0.400059 V = 0.138241, 0.0499707
> X = 0.0325043, 0.507255 V = 0.138241, -0.00362728
> X = 0.0325043, 0.614451 V = 0.138241, -0.0572253
> X = 0.0325043, 0.721647 V = 0.138241, -0.110823
> X = 0.0325043, 0.828842 V = 0.138241, -0.164421
> X = 0.0325043, 0.936038 V = 0.138241, -0.218019
The x-component of the velocity is constant so each column of points should stay aligned.
I'm out of time to investigate this today but I think the bug might be the other way around - either the local copy code is wrong or there is a bug in the marker point code.
If I run that test executable with this patch then the output is the same regardless of the number of processors, However:
[drwells@mitral IBTK]$ diff marker_points_02_2d.timestep.h5part.output marker_points_02_2d.timestep.h5part.mpirun\=2.output
79,86c79,86
< X = 0.956775, 0.0845583 V = 0.956775, 0.292279
< X = 0.956775, 0.20147 V = 0.956775, 0.350735
< X = 0.956775, 0.318382 V = 0.956775, 0.409191
< X = 0.956775, 0.435293 V = 0.956775, 0.467647
< X = 0.956775, 0.552205 V = 0.956775, 0.526102
< X = 0.956775, 0.669117 V = 0.956775, 0.584558
< X = 0.956775, 0.786028 V = 0.956775, 0.643014
< X = 0.956775, 0.90294 V = 0.956775, 0.70147
---
> X = 0.956775, 0.0845583 V = 0.949955, 0.292279
> X = 0.956775, 0.20147 V = 0.949955, 0.350735
> X = 0.956775, 0.318382 V = 0.949955, 0.409191
> X = 0.956775, 0.435293 V = 0.949955, 0.467647
> X = 0.956775, 0.552205 V = 0.949955, 0.526102
> X = 0.956775, 0.669117 V = 0.949955, 0.584558
> X = 0.956775, 0.786028 V = 0.949955, 0.643014
> X = 0.956775, 0.90294 V = 0.949955, 0.70147
89,97c89,97
< X = 0.0325043, 0.0784706 V = 0.138241, 0.210765
< X = 0.0325043, 0.185667 V = 0.138241, 0.157167
< X = 0.0325043, 0.292863 V = 0.138241, 0.103569
< X = 0.0325043, 0.400059 V = 0.138241, 0.0499707
< X = 0.0325043, 0.507255 V = 0.138241, -0.00362728
< X = 0.0325043, 0.614451 V = 0.138241, -0.0572253
< X = 0.0325043, 0.721647 V = 0.138241, -0.110823
< X = 0.0325043, 0.828842 V = 0.138241, -0.164421
< X = 0.0325043, 0.936038 V = 0.138241, -0.218019
---
> X = 0.0147398, 0.081608 V = 0.542664, 0.220586
> X = 0.0147398, 0.193867 V = 0.542664, 0.180123
> X = 0.0147398, 0.306125 V = 0.542664, 0.139661
> X = 0.0147398, 0.418384 V = 0.542664, 0.0991993
> X = 0.0147398, 0.530642 V = 0.542664, 0.0587372
> X = 0.0147398, 0.642901 V = 0.542664, 0.0182751
> X = 0.0147398, 0.75516 V = 0.542664, -0.022187
> X = 0.0147398, 0.867418 V = 0.542664, -0.0626491
> X = 0.019355, 0.974035 V = 0.396029, -0.182672
99,108c99,108
< X = 0.0420723, 0.0736804 V = 0.053882, 0.21316
< X = 0.0420729, 0.168974 V = 0.0538795, 0.165513
< X = 0.0420729, 0.264267 V = 0.0538795, 0.117866
< X = 0.0420729, 0.359561 V = 0.0538795, 0.0702196
< X = 0.0420729, 0.454854 V = 0.0538795, 0.0225729
< X = 0.0420729, 0.550147 V = 0.0538795, -0.0250737
< X = 0.0420729, 0.645441 V = 0.0538795, -0.0727204
< X = 0.0420729, 0.740734 V = 0.0538795, -0.120367
< X = 0.0420729, 0.836028 V = 0.0538795, -0.168014
< X = 0.04203, 0.93117 V = 0.0540485, -0.215585
---
> X = 0.0381908, 0.0751988 V = 0.0768031, 0.212401
> X = 0.0381723, 0.1749 V = 0.0769495, 0.16255
> X = 0.0381723, 0.274575 V = 0.0769495, 0.112713
> X = 0.0381723, 0.37425 V = 0.0769495, 0.0628752
> X = 0.0381723, 0.473924 V = 0.0769495, 0.0130378
> X = 0.0381723, 0.573599 V = 0.0769495, -0.0367996
> X = 0.0381723, 0.673274 V = 0.0769495, -0.086637
> X = 0.0381723, 0.772949 V = 0.0769495, -0.136474
> X = 0.0381723, 0.872623 V = 0.0769495, -0.186312
> X = 0.0380256, 0.957052 V = 0.0781211, -0.228526
so the results are way off between these on master
, which seems wrong. The mpirun=2
case nearly agrees with this patch (one point is off) and at 4 MPI processes they agree. Using 1, 2, or 4 MPI processes produces the same output with this patch.
Those tests fail because we are inconsistently filling things across periodic boundaries. Using a solution which is actually continuous and periodic fixes the problem.
The next set of failing tests is
IBTK/nodal_interpolation_01_3d.cf.z.input
IBTK/nodal_interpolation_01_3d.cf.input
IBTK/nodal_interpolation_01_3d.cf.x.input
navier_stokes/navier_stokes_01_2d.input
It should be OK to have an actual solution discontinuity at the periodic boundary... the result should still be well defined.
With side-centered data, there is the question of "which solution wins" at periodic boundaries --- maybe that tie breaking is not being applied consistently?
I suspect this is a bug somewhere in SAMRAI, then, because we get consistent results (across different numbers of processors) when I make the solution continuous and inconsistent results when the solution is discontinuous. That's independent of this patch.
Are things more consistent if you have at least 2 patches in each direction?
I'll have to do some more experiments to know for sure, but I believe the answer is yes. If I fix all patches to be 4x4 then things are always consistent in parallel (i.e., this may be a literal corner-case)
requiring at least 2 patches in each periodic direction seems like it might be the easiest thing to do.
I bet that SAMRAI is not copying the periodically shifted copy of the patch onto itself.
Probably - that might explain why using more patches (which happens when I use more MPI processes) produces different results.
To be fair, we're working on banning AMR across periodic boundaries in deal.II because periodic BCs are a constant source of bugs.
The nodal interpolation tests don't use periodic boundaries. I'd bet there are more nodal bugs there but I'll have to do some more investigating to figure that out for sure.
This patch is revealing a bunch of interesting SAMRAI bugs - that's a second benefit to removing this subsystem.
This causes a couple of tests to fail for interesting reasons, notably
The navier-stokes test is almost passing (the errors are near the tolerance of
numdiff
) but the first two are way off - not sure why. Perhaps we are copying things in a bad order now.