KratosMultiphysics / Kratos

Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
https://kratosmultiphysics.github.io/Kratos/
Other
1.01k stars 244 forks source link

Out of range error in Eigenvalue Solvers with big models #8064

Closed emiroglu closed 3 years ago

emiroglu commented 3 years ago

I have a structural model with about 3.6 million DOFs using volume, shell, truss and nodal elements. In addition, I am using master-slave constraints.

I have tried using both Eigen and FEAST solvers for solving for the eigenvalues, but it wasn't successful yet. The problems I am facing are usually related to a segmentation fault that stems from an index going out of range in the matrix class. I had a similar problem when I used "super_lu" and solved it by either using a computer with a bigger RAM or switching to "pardiso_lu".

Now, when I use the eigenvalue solver from the Eigen library the same problem occurs. Then, I tried compiling Kratos with Intel MKL and using FEAST for the eigenvalue solution. This time the program is terminated during the solution with a message "Killed" regardless of debug or release mode without further information on the error.

I have a couple of questions regarding the setup of FEAST solver:

I have also noticed very inconsistent behavior regarding the choice of the builder and solver. In the past, when Eigen library was used, some cases worked with elimination b&s and the others with block b&s.

Could somebody help me out to pin where the problem might be lying? Used solvers? Solver parameters? Possible bugs in the code?

I have tried quite some combinations but could not detect a distinct pattern yet. I will try to run a more systematic set of simulations and report the errors here.

Thanks!

@KratosMultiphysics/structural-mechanics

loumalouomega commented 3 years ago

Which could be the simplest case failing you can share, so we can debug?

philbucher commented 3 years ago

@qaumann could you help with the settings for the feast solver?

philbucher commented 3 years ago

@emiroglu after our last debugging session I added test for the eigen-things with and without constaints, using both the block and the elim B&S:

Ofc those problems are very small and might still miss some things

As @loumalouomega says having a smaller case with the same problem would help tremendously with debugging it, I wouldn't be surprised if we still had some bugs related to a such complex model setup

also I think the wiki page you link is outdated, but @qaumann knows better (I think this was for the old feast)

Also I am still a bit confused, aside from eigenvalue analysis you also tried to run a static one with pardiso? If the problem happens there already then I guess sth else is already not working

emiroglu commented 3 years ago

@philbucher thanks for the links. This is more or less how I am applying the constraints too but of course on a different geometry where a node becomes slave to many other nodes. In my case the elimination B&S does not seem to be working at all.

You could find both the GiD file and the case files of the below simulations here: GiD Case CaseFiles

I ran some combination of simulation parameters by changing the eigenvalue solver, B&S and OMP_NUM_THREADS. All cases are ran in full debug mode. The cases only ran when OMP_NUM_THREADS=1.

I also noticed that GiD15 is not able to visualize the results of the simulation. It gives a segmentation fault. Could this be due to the output process?

Eigenvalue Solver Builder & Solver OMP_NUM_THREADS Error
Eigen Block 1 -
Eigen Block X Seg Fault (No other hint)
Eigen Elimination 1 Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped)
Eigen Elimination X Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped)
FEAST Block 1 -
FEAST Block X Seg Fault (No other hint)
FEAST Elimination 1 Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped)
FEAST Elimination X Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped)

Unfortunately I am not able to reproduce the error I was getting with the big model. That error might be due to the insufficient RAM+Swap. I will try to run this simple case with a much finer mesh to see if I can trigger that too.

RiccardoRossi commented 3 years ago

i find this VERY worrysome. where is the error thrown? (I mean, from what part of the code?)

emiroglu commented 3 years ago

I am suspecting that the Elimination B&S is not working because I have a node that does not belong to any elements, and I am applying the MasterSlaveConstraint in ModifyInitialGeometry which is called after AddDofs in analysis_stage

Is there another way to apply the master-slave constraints before the DOFs are added? Should the DOFs added manually at these nodes?

@RiccardoRossi I will take a look at in which function Segmentation Fault is occuring.

oberbichler commented 3 years ago

@emiroglu does static analysis work in all these cases?

qaumann commented 3 years ago

@qaumann could you help with the settings for the feast solver?

I will update the wiki page with the new solver parameters. I did not know about this page, so I did not update it when I changed FEAST. Sorry about this.

qaumann commented 3 years ago

@emiroglu The FEAST wiki page now contains updated information about the parameters and how to set them to obtain a meaningful result.

emiroglu commented 3 years ago

@emiroglu does static analysis work in all these cases?

@oberbichler The uploaded model works with block b&s and both with super_lu and pardiso_lu and with OMP_NUM_THREADS=1 but not when elimination b&s is used or with multiple threads. The errors are the same as above.

i find this VERY worrysome. where is the error thrown? (I mean, from what part of the code?)

@RiccardoRossi Segmentation Fault is thrown after ResidualBasedBlockBuilderAndSolver: Build time: 3.2586, so I am assuming it is somewhere inside the Solve function.

bad_index error related to the Elimination B&S is thrown after ResidualBasedEliminationBuilderAndSolverWithConstraints: Constraint relation build time and multiplication: 2.91634 so here I am assuming it is after BuildWithConstraints.

@emiroglu The FEAST wiki page now contains updated information about the parameters and how to set them to obtain a meaningful result.

thanks @qaumann

oberbichler commented 3 years ago

OK. Then my first guess would be that the error is actually not within the EigenSystemSolver. If it turns out that the problem occurs only with EigenSystemSolver or FEAST we can take a look at the code.

emiroglu commented 3 years ago

OK. Then my first guess would be that the error is actually not within the EigenSystemSolver. If it turns out that the problem occurs only with EigenSystemSolver or FEAST we can take a look at the code.

Still, the simulation setups that work in the above table with the provided simpler model do not work with the original model I am running. Then, with EigenSystemSolver I get a similar error as when the Elimination B&S is used and with FEAST I simply get a Killed without any other hints.

armingeiser commented 3 years ago

I tried the example from above locally:

philbucher commented 3 years ago

AFAIK the ElimBS has some concurrency issues for a while already. I am pretty sure they were never fixed

armingeiser commented 3 years ago

AFAIK the ElimBS has some concurrency issues for a while already. I am pretty sure they were never fixed

I also have something in mind. I think @adityaghantasala mentioned it does not work with MPCs?

emiroglu commented 3 years ago

I could reproduce the error related to the FEAST eigenvalue solver with a finer model. Simulation files can be found here. The link is available for 14 days.

I used FEAST, Block B&S and OMP_NUM_THREADS=1 in FullDebug mode.

The simulation exits with Killed without any further hint. The terminal output can be found here.

Any opinions on this?

qaumann commented 3 years ago

@emiroglu I tried to compute your example with 1.3M dofs, but my RAM is not large enough, so the process is killed, after memory and swap (40GB total) are completely full. So I suspect this is the reason for the process being killed in my case (I could also not solve the problem with Matlab with 32GB RAM). Is this the same behavior you are experiencing, or is your process being killed by something else? I also noticed, that the input stiffness matrix is not symmetric, is this intended? (The non-symmetry most likely is a round-off error caused by the matrix market interface) If so, you need other settings for FEAST, e.g.

"e_mid_re" : 0.0,
"e_mid_im" : 0.0,
"e_r" : 1.0e4,
"echo_level": 1,
"max_iteration": 1000,
"number_of_eigenvalues": 2,
"search_highest_eigenvalues": false,
"search_lowest_eigenvalues": false,
"solver_type": "feast",
"sort_eigenvalues": false,
"sort_order": "sr",
"subspace_size": 4,
"symmetric": false,
"tolerance": 1e-6
emiroglu commented 3 years ago

@qaumann thank you for taking a look. I could never observe the exact memory usage at the time of termination since in FullDebug mode it takes some hours until that point, but I am also suspecting that.

I cannot really comment on the symmetry. It might as well be so due to master-slave constraints. Could @adityaghantasala maybe comment on the symmetry property of the stiffness matrix when master-slave constraints are used? Nevertheless, for the smaller model the computed eigenfrequencies with eigen_eigensystem solver and feast solver, assuming symmetric matrices, matched.

qaumann commented 3 years ago

@emiroglu Sorry about the comment on symmetry, I corrected it above. It should be quite symmetric, the deivations are of order 1e-14, I did not check that immediately. I managed to solve the problem with Matlab, there it required ~50 GB memory. As I do not have a Linux machine with that much memory available at the moment, I cannot dig deeper, I am afraid. But I would assume the "killed"-problem stems from too little memory and is not directly related to FEAST.

emiroglu commented 3 years ago

The reason for the terminated simulation in combination with eigen_eigensystem is the used underlying linear system solver for the subspace iterations. The regular compilation of Kratos defaults to sparse_lu from Eigen library, that causes excessive memory usage. When Kratos is compiled with IntelMKL, the linear system solver defaults to the pardiso_ldlt solver, which can handle such big cases. Here

This behavior is not controllable by the user and it is not obvious unless one takes a look into the code.

I think it would make sense to have an option for selecting the linear system solver in ProjectParameters.json, so that the user is also aware of this possibility.

The other problems in this issue seem to be related to other parts of the code, so I will close this one and open others with more related titles.

Thank you all for your time and help!

armingeiser commented 3 years ago

This behavior is not controllable by the user and it is not obvious unless one takes a look into the code.

It is controllable since you can select if MKL should be used or not?

Use of a direct solver over e.g. an iterative one is suggested here, since multiple RHS are solved.

philbucher commented 3 years ago

it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)

oberbichler commented 3 years ago

I think the idea is to add a json parameter for selecting the linear solver in the same way as e.g. for a static problem.

I agree with @armingeiser that choosing the fastest available direct solver make sense for almost all cases.

Anyway it might be a nice-to-have feature for special cases. But as long as nobody would benefit from such a feature I would not spend time on this.

emiroglu commented 3 years ago

it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)

Yes, there is the use_mkl_if_available parameter but the linear system solver cannot be chosen freely by the user. It is either pardiso_ldlt with mkl or sparse_lu without. We could discuss this in another request issue.

oberbichler commented 3 years ago

it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)

crash or assertion failed?

e-dub commented 3 years ago

Maybe here another comment on the use of FEAST: @theshahzeb is currently carrying out unconstrained eigenfrequency studies (free-free) and has found that e_min = 0 does not find all six rigid body translations and rotations. Depending on the model, between two and five "numerically" zero modes are found. With a small negative number, nan is found for the missing rigid body degrees of freedom. As this behavior was (at least for us) unexpected, we thought we had an error with our model.

armingeiser commented 3 years ago

Another important thing to note since the problem is memory consumption for the direct solve:

MKL solvers have the possibility to run out of core (slower but part of the internal variables are written to disc) - This is not yet implemented in the LinearSolversApplication but could be added if needed.

https://eigen.tuxfamily.org/dox/classEigen_1_1PardisoLDLT.html

oberbichler commented 3 years ago

@e-dub AFAIK none of these algorithms is suitable for finding zero-eigenvectors. If I'm wrong I would be very interested in the correct setup. At the moment I am using SVD...

e-dub commented 3 years ago

@oberbichler Thanks for the heads up on that. I would like to compare SVD with what @Theshahzeb found. Could you share the ProjectParameters for the SVD?

oberbichler commented 3 years ago

I am actually not using Kratos for that.

e-dub commented 3 years ago

@oberbichler: @veiguf and I looked at i.a. free-free vibrations with Kratos-FEAST 2.5 years ago and verified the results with ANSYS, see here. Of course, quite a bit has changed in Kratos since then...

Do you have more information for us why FEAST is not suitable for zero eigenvalues? What software are you using for free-free vibraitons with SVD?

qaumann commented 3 years ago

@e-dub As far as I know, you can compute all eigenvalues using the FEAST routines, if you have set its limits accordingly. With a small range towards both sides around zero, FEAST should find the eigenmodes corresponding to your rigid body movement. If your system matrices are symmetric, you can now (with FEAST 4.0) also find the n lowest eigenvalues in your specified range, so this could be what you are looking for. The subspace iteration implemented in eigen_eigensystem is working differently, I think, but I have not much experience with it.

Theshahzeb commented 3 years ago

@e-dub As far as I know, you can compute all eigenvalues using the FEAST routines, if you have set its limits accordingly. With a small range towards both sides around zero, FEAST should find the eigenmodes corresponding to your rigid body movement. If your system matrices are symmetric, you can now (with FEAST 4.0) also find the n lowest eigenvalues in your specified range, so this could be what you are looking for.

"search_lowest_eigenvalues" and "number_of_eigenvalues" did not work for us. Solver keeps running till it finds all the eigenvalues in the selected range. Selecting e_min slightly below zero gives all the rigid body eigenmodes indeed.

oberbichler commented 3 years ago

Thanks. I will take a look.

@qaumann EigenSystemSolver needs to solve the system. I don't know if this step could be modified in order to work with singular matrices. Since in this case the range of the eigenvalues is known, FEAST should be fine.

qaumann commented 3 years ago

"search_lowest_eigenvalues" and "number_of_eigenvalues" did not work for us. Solver keeps running till it finds all the eigenvalues in the selected range. Selecting e_min slightly below zero gives all the rigid body eigenmodes indeed.

@Theshahzeb That should not happen. Could you provide your FEAST settings and perhaps its output with echo_level=1? Normally, it should find only number_of_eigenvalues in the given interval.

Theshahzeb commented 3 years ago

@Theshahzeb That should not happen. Could you provide your FEAST settings and perhaps its output with echo_level=1? Normally, it should find only number_of_eigenvalues in the given interval.

Thank you looking into this, these are the feast settings I am using, trying to find lowest values: "eigensolver_settings":{ "solver_type" : "feast", "symmetric" : true, "number_of_eigenvalues" : 16, "e_min" : -0.1, "e_max" : 1e09, "search_lowest_eigenvalues" : true, "search_highest_eigenvalues" : false, "sort_eigenvalues" : true, "sort_order" : "sr", "subspace_size" : 100, "max_iteration" : 1000, "tolerance" : 1e-6, "echo_level" : 1 }, This is the feast output feast_lowest_eigenvalues If i do "search_lowest_eigenvalues" : false, This is the output feast_echo1

qaumann commented 3 years ago

@Theshahzeb Thank you for providing me with the output. There does not seem to be an error in the configuration, but FEAST's stochastic estimate fails. If you can share your model with me, I can take a more thorough look why it fails.