MartinKarp commented 6 months ago

The intent of this issue is to have somewhere to discuss what an appropriate norm and stopping criterion for our Krylov solvers would be. Upon request, I compile here a short summary of what systems we are solving and what is currently in the code.

For the residuals for the stopping criterion in our linear solvers, we do not solve for the full system at any point in time, but for the change in time, $\Delta x = x^i-x^{i-1}$ where $x$ is either the pressure or velocity and $i$ corresponds to the value at time step $i$.

For some linear system $Ax=b$ (Poisson equation for the pressure and Helmholtz for the velocity), we transform the system to the form $$Ax^i = b^i => A\Delta x = b^i -A x^{i-1} $$ before it enters our Krylov solvers (usually GMRES for the pressure and CG for the velocity).

This means the magnitude of the starting rhs $b^i -A x^{i-1}$ scales with the time step $\Delta t$ and the magnitude of $x$.

This discussion is created to assess what norm we would like to minimize when solving our system, and what criterion we should have for $$||A\Delta x - b^i +A x^{i-1} || =||Ax^i - b^i|| < tol.$$

The choice of our norm $||x||$ for this stopping criterion is important as we would like it to preferably be independent of the number of grid points, the length of the time step, the volume of the domain, and the normalization of the flow.

Currently, the following norms might be considered:

The L2 norm normalized with the volume of the domain $||x||_2/\sqrt{vol} = \sqrt{(x \cdot x)/vol}$ (this is what is used currently in the code).
The standard L2 norm $||x||_2$.
The integral (maybe the wrong word) over the domain divided by the volume $\sqrt{(x^T B x)/vol}$ where $B$ is the mass matrix.

What has been noted in all these cases (except the integral) is that the norm scales with the number of grid points and for all of them also that the time step length plays an important role in choosing an appropriate tolerance. @timofeymukha had some thoughts on this and asked me to start a discussion about it here.

adperezm commented 6 months ago

Good to bring it up.

I would say that 3rd option sounds right, at least for controlling the dependence with the number of gridpoints. My reasoning for liking that one is given below, but you do not need to read :)

I did not realize that $\Delta t$ also influenced it. But of course it does :). For that one I do not have opinions or suggestions.

=============================== Why I like the 3rd ==============================================

If one calls the residual: $r = A\Delta x - b^i +A x^{i-1} $

One metric could be simply: the average of the residual over the full domain $\Omega$ (or volume V). To get an average, you need:

$\langle r \rangle{V} = \frac{1}{V} \int{V} r dV = \frac{1}{V} \mathbf{r^{T} B}$ (Abusing notation here)

Doing this then gives a clear intuition on the value reported, i.e., an average.

A more strict metric to avoid caring about signs is the root mean square "residual" which is the 3rd one you mention. So we get the average of the square and take the square root:

$rms(r) = \sqrt{\frac{1}{V} \int_{V} r^{2} dV} = \sqrt{\frac{1}{V} \mathbf{r^{T} Br}}$

So once again, this is a quantity that gives a clear indication on what I should expect when I set the tol in the case file. Note that option number 1 looks like this one, but you do not get an average inside the square root.

One that perhaps could be good to have is the $\vert\vert r \vert\vert_{\infty} = max(\mathbf{r})$. For this one we do not need weights, but it is because we do not perform any sum getting this one.

I would say that any measure that is unbounded is perhaps not so nice and would advocate for using the weights.

timofeymukha commented 6 months ago

These videos can perhaps serve as some inspiration. https://youtu.be/v9OnNeYH4Ok?si=9Lx0Ye8Tdv8rqclV

adperezm commented 6 months ago

I guess with the end of that video one can follow up on what I wrote.

One should either get, an average, a rms or an infty norm.

When I wrote it in my previous comment, I suggest we calculate averages over the volume, given we can nicely get proper integrals in sem.

The other option is of course that one does not use weights, but then scales with the number of points instead of the Volume. This way each point in the domain has the same weight (which might be desirable), but one is still performing a proper averaging operation.

The problem we have currently (option 1) is that we are mixing the methods in a way, so we are not really getting any proper form of averaging.

I would still lean towards the 3rd option that Martin proposed, however I see the value of abandoning the mass and volume altogether and just using the number of points.

timofeymukha commented 6 months ago

A few comments:

It seems to me that $A \Delta x -b^i + Ax^{i-1} = Ax^i -b^i$ so the fact that we solve for the $\Delta x$ should not make a difference. I may be wrong, but as consequence I do not think the timestep is relevant, or?
To know what to expect for a case, we need two components 1) For the residual to be independent of the geometry and number of unknowns. 2) For the residual to be scaled by a characteristic value which characterizes the flow problem. It should obviously be of the same dimension as a source term of the equation solved. So far we did not discuss 2 at all, some ideas can be found in the series linked above.
Regarding the options listed by Martin. Option 1) Gives a different residual for the same setup but with different number of DoFs. The more the DoFs the larger the residual for a given pointwise error. It also gives different values for the same setup but if the volume changes. For a quasi-2d problem, this leads to very strange behaviour, although one can claim that one should use the surface area there in principle. Option 2 again scales the residual up when you add DoFs and keep the pointwise error the same making your tolerance resolution-dependent. Options 3 seems like the best one to me.
As Adalberto says, we could also just divide by the number of DoFs. Compared to option 3, the question is: should dofs from a larger element contribute more to the residual? On the one hand it makes sense since they occupy a bigger chunk of the domain. But at the same time, the largest elements are usually used in the part of the domain where you don't really care about the solution, right?
I would not use the max norm, because it is just too restrictive. A single point in the domain would totally block accepting the solution. It could be an interesting additional metric though.

MartinKarp commented 6 months ago

I guess that we solve for $\Delta x$ does not make a difference, but the rhs $b^i$ will be dependent on the length of the time step so it should matter I think.

I think regarding the scaling it is ok for the non-dimensionalized case I guess?

As for scaling by the number of points I think the integral over the volume somehow makes more sense.

timofeymukha commented 6 months ago

I think the timestep will affect the iteration numbers needed to achieve the given tolerance, but the procedure to evaluate the residual does not depend on the timestep per se.

As you say, for a non-dimensional solver the scaling is not needed, but in our case we should have it. There can be an option to select the scale manually as well to e.g. set it to 1.

timofeymukha commented 6 months ago

FYI https://www.openfoam.com/documentation/guides/latest/doc/guide-solvers-residuals.html Somewhat more detailed here https://doc.cfd.direct/notes/cfd-general-principles/residual

pschlatt1 commented 6 months ago

does any of you understand the meaning of the "average" solution? To me that sounds a bit strange.

timofeymukha commented 6 months ago

@pschlatt1 I took at face value, i.e. as the average over the dofs. But we could try to find what happens in the code.

adperezm commented 6 months ago

@pschlatt1 I took at face value, i.e. as the average over the dofs. But we could try to find what happens in the code.

Meanwhile I was sure that it was an average in time. Since they apply the linear operator to it, i.e., $A\dot{\bar{x}}$.

And I thought their initial guess is that average in time since in their plot they start from $r=1$

So I guess as tim says, one should try. But what I would like someone to explain is what is the benefit of doing this one? It is not obvious to me from the links.

timofeymukha commented 6 months ago

I think it cannot be in time since the time averages are not necessarily even computed in OpenFOAM unless you tell it you want them. The refer to $x = \bar x$ as a uniform system, I figured that means the same value everywhere. But we should look in the code to make sure. I don't fully understand how this expression works either, but I guess it is worth figuring it out.

adperezm commented 6 months ago

I think it cannot be in time since the time averages are not necessarily even computed in OpenFOAM unless you tell it you want them. The refer to $x = \bar x$ as a uniform system, I figured that means the same value everywhere. But we should look in the code to make sure. I don't fully understand how this expression works either, but I guess it is worth figuring it out.

Ahaa, okay, makes sense. Thanks, silly me.

pschlatt1 commented 6 months ago

indeed, it really seems to just be the mean value, which will give you some sort of base value for the normalisation. But I miss for instance the mass matrix. I think we could try something like that; at least for a simple scaling of the velocity it would successfully remove. Is there perhaps a reference to some paper for that method?

MartinKarp commented 1 month ago

Reopening this again. I am thinking whether a simple relative residual like this might be one of the easiest ways to at least make it somewhat reasonable.

$$||Ax^i - b^i|| < tol||b^i||.$$

The smallest meaningful residual like this would then be (given a machine epsilon $\epsilon$)

$$\epsilon||b^i||.$$.

This would provide reasonable way of relating the tolerance specified to what is possible to achieve given the working precision.

Observe that the system we are solving currently is

$$Ax^i = b^i => A\Delta x = b^i -A x^{i-1} $$

Meaning that the smallest meaningful residual for this would be

$$ \epsilon*||b^i -A x^{i-1} ||$$

and this is often significantly smaller than

$$\epsilon||b^i||$$

indicating it is quite easy to obtain a residual for $\Delta x$ that is smaller than the smallest meaningful residual for $x$.

In addition, this makes the actual norm used less important, but it would avoid having people who run the code accidentally converging to relative tolerances way around or beyond machine epsilon.

ExtremeFLOW / neko

Norms and residuals #1273

=============================== Why I like the 3rd ==============================================