abarret / multiphase-stokes

Solver a mixture of fluids based on IBAMR
1 stars 0 forks source link

Red and black updates #29

Closed bindi-nagda closed 1 year ago

bindi-nagda commented 1 year ago

This fills in ghost cells between red and black updates. The code compiles and runs successfully.

Based on a few tests I did on a single processor, it seems that the performance hasn't degraded compared to the manual fix.

Running one more than one processor shows much worse convergence of the KSP solver. I haven't yet investigated this further.

abarret commented 1 year ago

Do you see the same degraded performance with one process but multiple patches on a single level?

bindi-nagda commented 1 year ago

When I increase the number of levels in the hierarchy (i.e. AMR grid with L-shaped refinement) and run with one process, then the performance also degrades. But this time the performance is not as bad as when running with more than one process (for either an AMR grid or uniform grid).

abarret commented 1 year ago

What if you run on one processor and one level but with multiple patches. Change the largest_patch_size in the input file to something that will create multiple patches.

bindi-nagda commented 1 year ago

Change the largest_patch_size in the input file to something that will create multiple patches.

When I run using multiple patches on one level with one process, the number of iterations to converge to rtol remains the same or in some cases is fewer.

For example for N = 64, using the preconditioner with 1 multigrid level and largest_patch_size = 16, 16, it takes 54 iterations to converge, compared to if largest_patch_size = 512,512, whereby it would take >100 iterations to converge.

abarret commented 1 year ago

That's odd. You get degraded performance with multiple processors on a single level, but with a single process single level with multiple patches, you get improved performance?

bindi-nagda commented 1 year ago

Yes, pretty odd. Although, the improved performance with a single process, single level, multiple patches happens only in some cases. For majority of cases, the performance is the same.

Also, without preconditioning, the number of iterations to converge remains the same when running with multiple processes.

I think this issue with parallel processing when USE_PRECOND = TRUE was happening before this PR. I was running things on a single processor so I missed it.

abarret commented 1 year ago

You can get different results in parallel because adding floating point numbers in different orders can give different results. But I wouldn't expect the number of iterations to change by more than 2 or 3 at most.

With multiple patches on a single level, I wonder if there's a synchronization step that needs to occur. If you're using this Jacobi-like update of cells, you smooth independently on each patch. This is kind of like "patch ordering" instead of "red-black ordering". It's just a different smoother.

abarret commented 1 year ago

Everything else held constant (process count, patch numbers, level numbers), how does red-black ordering compare to the original Jacobi-like update?

bindi-nagda commented 1 year ago

You can get different results in parallel because adding floating point numbers in different orders can give different results. But I wouldn't expect the number of iterations to change by more than 2 or 3 at most.

The number of iterations increases significantly, often never converging within the prescribed max iterations. In other words, the relative residual is decreasing very, very slowly.

Everything else held constant (process count, patch numbers, level numbers), how does red-black ordering compare to the original Jacobi-like update?

EDIT: The red black ordering is very similar in performance, i.e., only 1-2 iterations faster or slower than the original update with the "fix" in the fortran code.

abarret commented 1 year ago

I know that the preconditioner won't work with an arbitrary number of processors or patches. The grid generation for the coarser levels with the multigrid solver needs to be smarter, but I don't know how to do that immediately. I can't think of anything with the krylov solver though. It should work fine in parallel and with an arbitrary number of patches.

The red black ordering is very similar in performance, i.e., only 1-2 iterations faster or slower than the original update with the "fix" in the fortran code.

This is good. Let's go ahead and merge this since the parallel issues seem to be related to something else. Can you create a new issue that described the problem and your test, so we can keep track of it? In the interest of time, it's probably best to start checking the order of accuracy of the temporal solver.

bindi-nagda commented 1 year ago

Can you create a new issue that described the problem and your test, so we can keep track of it? In the interest of time, it's probably best to start checking the order of accuracy of the temporal solver.

Yes I can create a new issue.

I'll get started on checking the order of accuracy of the temporal solver. I should be printing out the errors every X number of time steps, then reducing delta_x (and consequently delta_t) by a factor of 2 for example, and then comparing errors again?

abarret commented 1 year ago

I'll get started on checking the order of accuracy of the temporal solver. I should be printing out the errors every X number of time steps, then reducing delta_x (and consequently delta_t) by a factor of 2 for example, and then comparing errors again?

Run to a final time and compute errors then. the time_stepping executable already does this to find a steady state. You need to make a problem that has variations in time.

bindi-nagda commented 1 year ago

Okay. Right, yeah , I need to set up a manufactured problem with time.