Pressure should not be part of save_deg >1 ?

keiyamamo commented 1 year ago

https://github.com/KVSlab/turtleFSI/blob/d149375c2d89f17e5662f352e4547ffd1a471750/turtleFSI/problems/__init__.py#L235-L247

Since we use p_deg=1 for most of the case, it does not make sense that the pressure is part of save_deg >1 functionality. We should check if the original degree is actually higher than 1st order before calling save_deg>1 part. Otherwise, we are interpolating the P1 solution from coarse to fine mesh.

zhangmuElias commented 1 year ago

Dear @keiyamamo

In fact, when I read the code of turtlefsi, I am curious why v_deg and d_deg are of order 2, but only p_deg is of order 1. In your other post, I saw you had tested some combinations. Do you mean the solver would diverge if using a high-order element for p? https://github.com/KVSlab/turtleFSI/pull/66#issue-1756354805

Best regard ZHANG

keiyamamo commented 1 year ago

Hi @zhangmuElias

No, the solver should be able to handle higher order element for the pressure. I just tested d_deg=v_deg=p_deg=2 and it works fine. The reason for p_deg=1 comes from the known fact v_deg=2, p_deg=1 tends to be more stable than v_deg=p_deg=1. Originally, this stable combination comes from Stokes problem and thus strictly this does not apply to turtleFSI since we are solving full Navier-Stokes equations. However, due to higher accuracy with v_deg=2, we prefer using v_deg=2 and p_deg=1. I used d_deg=v_deg=3 and p_deg=2 previously on the cluster and it worked well, but it becomes computationally very(!) expensive. I believe the divergence I had on my laptop actually comes from the lack of memory.

zhangmuElias commented 1 year ago

Thanks, @keiyamamo. Speaking of cluster. I tried using parallel running, and what I run in the commander is mpirun -np 10 turtleFSI -p TF_fsi. I don't know if this is the right way to perform parallel computation. Because when I tested the TF_fsi case, the computation time actually didn't reduce obviously, and even became longer when I used 20 cores. Is this a normal situation? As in my experience of using OpenFOAM, it is needed to settle the decomposite parts of the computation domain. Do we need to do this in turtlefsi?

Best regard ZHANG

keiyamamo commented 1 year ago

Hi @zhangmuElias

Your way of running parallel simulation is correct, but the problem is that 10 or 20 cores are way too many for the 2D problem, such as TF_fsi. The MPI communication between processors becomes too large so that it actually slows down the simulation. In some cases, you might need a cluster for memory purposes because turtle uses mixed finite element formulation and the size of the matrix can be quite big, it really depends on the problem and the parameter you uses like degree of the elements.

zhangmuElias commented 1 year ago

Dear @keiyamamo

Ok, I see. In my case, I increased the mesh element number to 20000 to model the thin plate vibration. The fsi solver would take 200 seconds for one time step. Is there a way to accelerate the calculation? I think my computer has enough resources to deal with such a 2d problem. When I monitored the CPU and memory used during the calculation, it read:

Would you know how to fully use my computer to run the fsi cases?

Also, I met a weird situation where if I ran two same TF_fsi cases, the simulation time for each case would increase to two times the time if I just ran one TF_fsi case. For example, when I run just one TF_fsi case, the time needed for solving one timestep is 10s. But if I run two TF_fsi cases simultaneously, the time needed for solving one timestep would be 20s. It seems they would influence each other. Did you meet such a thing?

keiyamamo commented 1 year ago

Does it take 200 seconds when you recompute Jacobian or without recomputing Jacobian? If it takes 200 seconds even without recomputing Jacobian, I can only think of using more than single processor (but not probably 10 cores). If it takes 200 seconds with recomputing Jacobian, you can try increasing recompute or recompute_tstep. The other thing you can change is the tolerance for Newton solver. What is your current value for recompute, recompute_tstep, and a/rtol?

zhangmuElias commented 1 year ago

In fact, I did't change the default value because I don't know if I change these, the solver will diverge or not.

    # Solver settings
    linear_solver="mumps",                     # use list_linear_solvers() to check alternatives
    solver="newtonsolver",                     # newtonsolver, there is currently no other choices.
    atol=1e-7,                                 # absolute error tolerance for the Newton iterations
    rtol=1e-7,                                 # relative error tolerance for the Newton iterations
    max_it=50,                                 # maximum number of Newton iterations
    lmbda=1.0,                                 # [0, 1.0] Cst relaxation factor for the Newton solution update
    recompute=5,                               # recompute the Jacobian after "recompute" Newton iterations
    recompute_tstep=1,                         # recompute the Jacobian after "recompute_tstep" time steps (advanced option: =1 is preferred)
    compiler_parameters=_compiler_parameters,  # Update the default values of the compiler arguments (FEniCS)

    save_step=1,                    # Save file frequency
    recompute=1,                  # Compute the Jacobian matrix every iteration

keiyamamo commented 1 year ago

Is this the newton iteration from your own problem?

zhangmuElias commented 1 year ago

Yes, it is from my own case with the mesh I presented before.

keiyamamo commented 1 year ago

I see. Your newton iteration does not look so “healthy” to me. It does converge, but very slow convergence even with recomputing Jacobian, which is not a good sign. Does it always take more than 17 iterations to converge? What happens when you recompute Jacobian is that the linear solver needs to perform whole LU factorization due to the fact that matrix A changes and it is an expensive computation. So, you want to reduce the number of recomputing Jacobian to speed up the solver but that does not seem to work in your case... Does it diverge if you set recompute=5 or 10? Computing Jacobian every iteration is very costly.

zhangmuElias commented 1 year ago

Does it always take more than 17 iterations to converge?

Yes, it needs more than 17 iterations when the deformations become large.

So, you want to reduce the number of recomputing Jacobian to speed up the solver but that does not seem to work in your case...

Why do you think reducing the number of recomputing would not work for me? Is it because it might make the solver more difficult to converge?

Does it diverge if you set recompute=5 or 10?

I haven't tried this yet, but I will do it for testing.

In the followed picture, I also noticed some strange things. Why the solver gave me results even r(atol) didn't reach the limited value 1e-7? Is this a signal of "unhealthy" you mentioned?

keiyamamo commented 1 year ago

Why do you think reducing the number of recomputing would not work for me? Is it because it might make the solver more difficult to converge?

It is because the residual does not decrease well during the Newton iteration.

In the followed picture, I also noticed some strange things. Why the solver gave me results even r(atol) didn't reach the limited value 1e-7? Is this a signal of "unhealthy" you mentioned?

It is not strange. Your relative tolerance is lower than the threshold r (rel) = 4.000e-08 (tol = 1.000e-07). Newton solver stops when either of the tolerance becomes lower than the threshold. What I mean by “unhealthy” is not a good decrease in the residual even when you recompute the Jacobian. For example, you can see that the following newton iteration has nice convergence. The residual decrease from e-03, e-07, and e-09 without recomputing the Jacobian.

Newton iteration 0: r (atol) = 1.530e-03 (tol = 1.000e-07), r (rel) = 2.224e-03 (tol = 1.000e-06) 
Newton iteration 1: r (atol) = 5.546e-07 (tol = 1.000e-07), r (rel) = 1.574e-03 (tol = 1.000e-06) 
Newton iteration 2: r (atol) = 5.759e-09 (tol = 1.000e-07), r (rel) = 2.323e-08 (tol = 1.000e-06)

zhangmuElias commented 1 year ago

OK, I understand now. Maybe I can loose the limit of tol from 1e-7 to 1e-4, for example. Is there anything I should notice for the setting of tol?

keiyamamo commented 1 year ago

The choice of the tolerance really depends on the problem and the parameter you use. For example, if the value of your interest is huge (say $10^9$), then you will most likely not achieve very small tolerance. There is a nice reference regarding the setting of Newton solver. See here.

zhangmuElias commented 1 year ago

Thanks, the link you provided is helpful. That reminds me that sometimes when I enlarged the domain and increased the mesh density, the case will diverge at the first step. Is it because I didn't set a good initial condition for the domain or the linear_solver should be changed from mumpsto other choices such as umfpack. But in fact, even if I changed the linear_solver or atol or rtol. I still got the following error:

*** Error:   Unable to solve linear system using PETSc Krylov solver.
*** Reason:  Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00).
*** Where:   This error was encountered inside PETScKrylovSolver.cpp.

keiyamamo commented 1 year ago

When you say “at first step”, do you mean first iteration of the first time step? or just first time step? If it fails from the very first iteration, then the problem might be not well-posed. The problem might be related to the memory issue as described https://turtlefsi2.readthedocs.io/en/latest/known_issues.html, but not sure if it’s the case with your problem.

zhangmuElias commented 1 year ago

Yes, it fails from the very first iteration, but I only changed the solving domain to a bigger one. I added the two lines code to newtownsolver.py it got the error message, do you might know what's might be the reason?

from dolfin import assemble, derivative, TrialFunction, Matrix, norm, MPI,PETScOptions
PETScOptions.set("mat_mumps_icntl_4", 1) # If negatvie or zero, MUMPS will suppress diagnositc printining, statistics, and warning messages.
PETScOptions.set("mat_mumps_icntl_14", 400) # allocate more memory to mumps

    up_sol.solve(dvp_res.vector(), b)
RuntimeError: 

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
***     fenics-support@googlegroups.com
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error:   Unable to solve linear system using PETSc Krylov solver.
*** Reason:  Solution failed to converge in 0 iterations (PETSc reason DIVERGED_PC_FAILED, residual norm ||r|| = 0.000000e+00).
*** Where:   This error was encountered inside PETScKrylovSolver.cpp.
*** Process: 0
*** 
*** DOLFIN version: 2019.1.0
*** Git changeset:  3ea2183fbfe7277de9f16cbe1a9ffaab133ba1fa
*** -------------------------------------------------------------------------

 ** ERROR RETURN ** FROM DMUMPS INFO(1)=  -10
 ** INFO(2)=          262336
 On return from DMUMPS, INFOG(1)=             -10
 On return from DMUMPS, INFOG(2)=          262336

keiyamamo commented 1 year ago

According to the MUMPS user guide,

-10 Numerically singular matrix. INFO(2) holds the number of eliminated pivots.

You can find the user guide here

zhangmuElias commented 1 year ago

Dear @keiyamamo Thanks, I just found out the reason why the solver gave out such an error. It is because I used Transfinite function in gmsh. While I delete the following two lines in gmsh, the solver will not diverge.

//Transfinite Surface {2};
//Transfinite Surface {3};

Also in my own working FSI case, I tried if I set recompute=5 it will accelerate the solving speed at the first 100 time steps, but it will diverge after.

Solved for timestep 159, t = 1.5900 in 76.9 s
Compute Jacobian matrix
Newton iteration 0: r (atol) = 2.214e+05 (tol = 1.000e-07), r (rel) = 1.198e+01 (tol = 1.000e-07) 
Newton iteration 1: r (atol) = 8.949e+04 (tol = 1.000e-07), r (rel) = 1.499e+02 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 2: r (atol) = 7.493e+04 (tol = 1.000e-07), r (rel) = 3.492e+03 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 3: r (atol) = 2.746e+07 (tol = 1.000e-07), r (rel) = 2.072e+03 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 4: r (atol) = 1.818e+06 (tol = 1.000e-07), r (rel) = 1.730e+03 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 5: r (atol) = 6.866e+05 (tol = 1.000e-07), r (rel) = 1.478e+03 (tol = 1.000e-07) 
Newton iteration 6: r (atol) = 2.917e+06 (tol = 1.000e-07), r (rel) = 4.298e+04 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 7: r (atol) = 2.673e+09 (tol = 1.000e-07), r (rel) = 3.250e+07 (tol = 1.000e-07) 
Compute Jacobian matrix
Newton iteration 8: r (atol) = 2.363e+16 (tol = 1.000e-07), r (rel) = 3.747e+14 (tol = 1.000e-07) 
Compute Jacobian matrix
    raise RuntimeError("Error: The simulation has diverged during the Newton solve.")
RuntimeError: Error: The simulation has diverged during the Newton solve.

keiyamamo commented 1 year ago

I’m not surprised that the solver diverged. The exact behavior of the Newton solver depends on the nature of the problem and other parameters such as time step. So, I would say that you have to try different parameters and see what kind of combination works the best for your particular problem.

zhangmuElias commented 1 year ago

Sure, your comments are very pertinent. One last question here, I met a weird situation where if I ran two same TF_fsi cases, the simulation time for each case would increase to two times the time if I just ran one TF_fsi case. For example, when I run just one TF_fsi case, the time needed for solving one timestep is 10s. But if I run two TF_fsi cases simultaneously, the time needed for solving one timestep would be 20s. It seems they would influence each other. Did you seen such a thing before? Or it's just my computer's problem.

keiyamamo commented 1 year ago

I’m not a computer scientist, but it seems natural that they influence each other. Your computer has limited memory and you can only fit certain amount of data in it for fast access (memory cache). So there is some delay due to slower access of data.

zhangmuElias commented 1 year ago

Sure, but my memory is 120gb, it would be enough for at least two TF_fsi case. I will try to figure out why that happens. Thanks again for your answers!

KVSlab / turtleFSI

Pressure should not be part of save_deg >1 ? #61