Broken scalar_referential caching option for Cook benchmark

davidscn commented 3 years ago

I replaced the collocation evaluation in the neo-Hook operator in the following way in order to build the program with the most recent deal.II development version.

https://github.com/davydden/large-strain-matrix-free/blob/66f4ecd864b141542851a71e4a83b847eb345c1b/include/mf_nh_operator.h#L1238-L1242

        dealii::internal::FEEvaluationImplCollocation<
            dim,
            n_q_points_1d - 1,
           NumberType>::evaluate(dim,
                                  EvaluationFlags::gradients,
                                  data_reference->get_shape_info(),
                                  cached_position,
                                  nullptr,
                                  ref_grads,
                                  nullptr,
                                  nullptr);

The method is used in case a scalar_referential caching method is employed for the tangent operator in the MF_CG solver. Looking at the documentation of both functions, the substitute should not change anything on the computational side. However, I tried to run the Cook benchmark using the appropriate material formulation and a scalar_referential caching and the computation fails. The linear solver converges, but the nonlinear solver does not. converge. I obtain the following output before the computation stops completely:

Grid:
  Reference volume: 0.00144
Triangulation:
  Number of active cells: 1024
  Number of degrees of freedom: 2178

Timestep 1 @ 0.1s
__________________________________________________________________________________________________
    SOLVER STEP     |  LIN_IT    LIN_RES   COND_NUM   RES_NORM      RES_U      NU_NORM       NU_U
__________________________________________________________________________________________________
  0  CST  ASM  SLV  |      15  1.401e-05  1.623e+03  1.000e+00  1.000e+00    1.000e+00  1.000e+00  
  1  CST  ASM  SLV  |      21  5.181e-05  3.886e+02  3.371e+00  3.371e+00    1.395e-01  1.395e-01  
  2  CST  ASM  SLV  |      15  2.851e-06  9.299e+02  4.452e-01  4.452e-01    1.169e-01  1.169e-01  
  3  CST  ASM  SLV  |      17  1.804e-06  3.395e+02  2.099e-01  2.099e-01    2.585e-03  2.585e-03  
  4  CST  ASM  SLV  |      14  2.666e-07  8.056e+02  2.273e-02  2.273e-02    3.637e-03  3.637e-03  
  5  CST  ASM  SLV  |      13  5.035e-08  3.617e+02  7.062e-03  7.062e-03    4.710e-05  4.710e-05  
  6  CST  ASM  SLV  |      14  5.287e-09  7.416e+02  3.512e-04  3.512e-04    1.372e-05  1.372e-05  
  7  CST  ASM  SLV  |      13  1.225e-09  2.884e+02  1.235e-04  1.235e-04    5.383e-07  5.383e-07  
  8  CST  ASM  SLV  |      14  7.952e-11  6.387e+02  7.710e-06  7.710e-06    1.731e-07  1.731e-07  
  9  CST  ASM  SLV  |      13  2.895e-11  2.839e+02  2.757e-06  2.757e-06    1.059e-08  1.059e-08  
_______________________________________________________________________________________

Timestep 2 @ 2.000e-01s
__________________________________________________________________________________________________
    SOLVER STEP     |  LIN_IT    LIN_RES   COND_NUM   RES_NORM      RES_U      NU_NORM       NU_U
__________________________________________________________________________________________________
  0  CST  ASM  SLV  |      16  8.852e-06  1.622e+03  1.000e+00  1.000e+00    1.000e+00  1.000e+00  
  1  CST  ASM  SLV

I don't understand it, since the tangent operator should (if any) change the convergence behavior of the linear solver and not of the nonlinear solver, not vice versa. Most parts of the parameter file correspond to this test file, I just changed the material formulation and the caching method.

Any guess why this happens? All other caching options work perfectly fine.

davidscn commented 3 years ago

For the sake of completeness: In order to exclude my compatibility changes as potential error source, I ran all available cases (from the tests) using the master branch of this repository and deal.II v9.2. All caching strategies work fine, apart from the scalar_referential option. For the 'Holes' and the 'Cook' membrane test case, I get a debug errror message

An error occurred in line <2254> of file </home/dealii/large-strain-matrix-free/include/mf_elasticity.h> in function
    void Cook_Membrane::Solid<dim, degree, n_q_points_1d, NumberType>::solve_nonlinear_timestep() [with int dim = 2; int degree = 1; int n_q_points_1d = 2; NumberType = double]
The violated condition was: 
    std::abs(diff.local_element(i)) <= std::numeric_limits<double>::epsilon() * ulp * std::abs(dst_mf.local_element(i))
Additional information: 
    MF and MB are different on local element 0: 579464.868661 diff -16324.099923 at Newton iteration 1

Running the cases in release mode ends up in a diverging linear system as reported in the comment above.

davidscn commented 3 years ago

I think I found the issue. The following line needs to be replaced by https://github.com/davydden/large-strain-matrix-free/blob/66f4ecd864b141542851a71e4a83b847eb345c1b/include/mf_nh_operator.h#L1303

+              phi_reference.submit_gradient(queued * transpose(F_inv), q);

At least on my machine, it works with the fix. In some places, the iteration count is even smaller as compared to the pure scalar case.

I will let the issue here open. Feel free to close it in case you fixed it or disagree on the issue/fix in general.

jppelteret commented 3 years ago

Hi @DavidSCN! Sorry for taking some time to reply. Neither @davydden nor I work in academia any more, so we don't really keep tabs on our past projects. Nevertheless, thanks for letting us know about this issue, and ultimately what the source of the problem is and for the suggested fix. I'm really glad that you managed to work something out in the mean time.

I guess I need to re-familiarise myself with what we were doing here to comment on the fix itself, but all of these implementations should in theory result in the same convergence history since they are simply different implementations of the same basic formulation + constitutive law. When I find the time to do so, I'll go through the working again to see why we ended up with the currently implemented line as opposed to yours.

If it means anything, IIRC one of the main results of our paper is that the method that cached everything (both stress and material tangents) ended up being competitive with the more streamlined approach. Weighing up the loss of performance with the vastly increased generality (allowing it to be applied as-is to other constitutive laws) makes it a worthwhile avenue to pursue if you're going to use this general MF approach in some broader FEM framework.

kronbichler commented 3 years ago

That code was written by me, so let me take the blame. We did some basic verification on the case, but not a very comprehensive one, so it is very much possible that a mistake slipped through, especially since I have not that much research experience in elasticity so I could have mixed up some tensor indexing. Your fix looks interesting and it seems to be good. Before I approve let me ask you how you found this alternative, as I would like to understand the mistake in the original reasoning, and because I am unsure whether we would also need to fix up the indices in https://github.com/davydden/large-strain-matrix-free/blob/66f4ecd864b141542851a71e4a83b847eb345c1b/include/mf_nh_operator.h#L1281-L1290 which should be the transpose operation of what is applied before submit_gradient().

davidscn commented 3 years ago

I guess I need to re-familiarise myself with what we were doing here to comment on the fix itself, but all of these implementations should in theory result in the same convergence history since they are simply different implementations of the same basic formulation + constitutive law.

I attached a diff -y tensor2.log scalar_referential.log so that you can get an impression whether both implementations are 'similar enough' in terms of the convergence history diff_tensor2_scalar_ref.log.

If it means anything, IIRC one of the main results of our paper is that the method that cached everything (both stress and material tangents) ended up being competitive with the more streamlined approach. Weighing up the loss of performance with the vastly increased generality (allowing it to be applied as-is to other constitutive laws) makes it a worthwhile avenue to pursue if you're going to use this general MF approach in some broader FEM framework.

Yes, I read the paper and I think it's worth keeping various options for the tangent operator since the particular caching option is problem dependent.

Your fix looks interesting and it seems to be good. Before I approve let me ask you how you found this alternative, as I would like to understand the mistake in the original reasoning, and because I am unsure whether we would also need to fix up the indices in

https://github.com/davydden/large-strain-matrix-free/blob/66f4ecd864b141542851a71e4a83b847eb345c1b/include/mf_nh_operator.h#L1281-L1290 which should be the transpose operation of what is applied before submit_gradient().

I probably cannot give you a sufficient answer since I'm also not an expert in solid mechanics. The manually implemented contractions are hard to read. I verified only the manually implemented versions get_gradient above, which result in the correct gradient. However, in your paper, you write

For the derivatives of the current solution, an additional multiplication by F −T for both the solution as well as the test function is necessary, with F −T evaluated on the fly after line 10 in Algorithm 1

So, you manually implemented the push forward operation for the gradient, (also shown in the code snippet above transpose(F_inv) * grad_Nx_v), but there is still another push forward required for the test function (as you also mentioned it in the paper). This is at least how I thought about it and why I changed it accordingly. In the end, I still don't get why this works out since the test function push forward is only applied locally at this particular quadrature point, but not on all related test functions (i.e. the loop over all DoFs, which is usually applied in matrix based implementations) of the corresponding element. In order to 'finally' fix the issue, someone with a better understanding of the theory should probably take a closer look.

kronbichler commented 3 years ago

Regarding the text in the paper, we should not see this statement as set in stone because it was written by me (even though @jppelteret and I believed we reached consensus on this topic back then). I think the original version in the code reflects the order by which we need to multiply with the transformation - I took the non-transpose version because we apply F^{-T} to the solution values grad_Nx_v. Given that F^{-T} it is factored out from the test functions and that we can think of the test functions being taken with a dot product (summation), I believed that it should be the transpose, i.e., F^{-1}. However, I guess I should check the F matrix again and look up what @jppelteret wrote on the mailing list of deal.II. J-P, any additional comments from your side that would help me identify the issue?

jppelteret commented 3 years ago

I think that I've worked out what's gone wrong between this and the other version of the scalar scheme (cell_mat->formulation == 1 && mf_caching == MFCaching::scalar). I think that we accidentally tripped over the indices when transforming between configurations, and this becomes a bit more clear when writing things in index notation.

So we had previously submitted the resulting action against the spatial gradient of the test function. This means that we had (dropping the test function notation and the DoF index for convenience)

grad(u) : A <==> du_{i}/dx_{j} A_{ij}

with u being the displacement, x the current coordinate and A the "result of action" tensor. The lower case indices indicate quantities in the spatial configuration. We can transform the spatial differential operation into its referential counterpart

grad(u) = Grad(u).F^{-1} <==> du_{i}/dX_{K} dX_{K}/dx_{j} = du_{i}/dX_{K} F^{-1}_{Kj}

with X being the coordinate in the reference configuration and F the deformation gradient. From this we get

du_{i}/dx_{j} A_{ij}
  = du_{i}/dX_{K} F^{-1}_{Kj} A_{ij}
  = du_{i}/dX_{K} [A_{ij} F^{-1}_{Kj}]
==>
grad(u) : A
  = \Grad(u) : [A.F^{-T}]
  = \Grad(u) : [F^{-1}.A^{T}]^{T}

So I believe that we made a mistake here, and @DavidSCN is right (and has the correct suggestion for the fix). We can't just shift the F^{-1} to the term on the other side of the double contraction. The contracted indices are then incorrect.

That code was written by me, so let me take the blame.

Err... no! Team effort -- we all missed this one. Sorry that I didn't pick it up before :-/

Your fix looks interesting and it seems to be good. Before I approve let me ask you how you found this alternative, as I would like to understand the mistake in the original reasoning, and because I am unsure whether we would also need to fix up the indices in...

I think that this is still OK.

kronbichler commented 3 years ago

I had to spend 15 minutes to really think through this aspect and I think I finally agree. The problem is, exactly as @DavidSCN suggested, that we mix up the indices with the high-level product F_inv * queued, and I regret I didn't notice it when looking at how we treat the indices inside FEEvaluation::submit_gradient here: https://github.com/dealii/dealii/blob/dd33e6b556e146d1a70c8d617776f72cb9c3c52e/include/deal.II/matrix_free/fe_evaluation.h#L5978-L5985 . What happened in the compact notation is that we contracted over the rows of A (index i in the post by @jppelteret), while it should have been the columns, which you get by contracting with the transpose from the other side. That's why I usually prefer the index notation albeit being less readable (and I should have been warned by the many comments with "MK" that I inserted for myself).

kronbichler commented 3 years ago

So @DavidSCN feel free to open a pull request with that line to be changed. Just to be sure, is the issue with the quadratic convergence you posted on the mailing list also related to the bug here (so is the linearization still problematic) or is it a different problem?

jppelteret commented 3 years ago

I'll leave @DavidSCN to confirm it, but I believe that the lack of quadratic convergence as mentioned on the mailing list was due to a bug. Some erroneous step in converting the Kirchoff stress into the Piola stress (as used in the residual computation). Right?

davidscn commented 3 years ago

I'll leave @DavidSCN to confirm it, but I believe that the lack of quadratic convergence as mentioned on the mailing list was due to a bug. Some erroneous step in converting the Kirchoff stress into the Piola stress (as used in the residual computation). Right?

Right. Although the residual assembly (as I learned from J-Ps post) is not directly related to the formulation of the linearization. So, the problem, namely integration with transformed test functions, is the same, but the application is different. That's also the reason why the referential linearization was of particular interest to me. I still need to check (quantitatively) the quadratic convergence for my mailing list problem, but my gut feeling says it's working properly. I will open the PR. No blame for anyone please, the indexing in this context is just nasty.

davydden / large-strain-matrix-free

Broken scalar_referential caching option for Cook benchmark #98