Closed emiroglu closed 3 years ago
Which could be the simplest case failing you can share, so we can debug?
@qaumann could you help with the settings for the feast solver?
@emiroglu after our last debugging session I added test for the eigen-things with and without constaints, using both the block and the elim B&S:
Ofc those problems are very small and might still miss some things
As @loumalouomega says having a smaller case with the same problem would help tremendously with debugging it, I wouldn't be surprised if we still had some bugs related to a such complex model setup
also I think the wiki page you link is outdated, but @qaumann knows better (I think this was for the old feast)
Also I am still a bit confused, aside from eigenvalue analysis you also tried to run a static one with pardiso? If the problem happens there already then I guess sth else is already not working
@philbucher thanks for the links. This is more or less how I am applying the constraints too but of course on a different geometry where a node becomes slave to many other nodes. In my case the elimination B&S does not seem to be working at all.
You could find both the GiD file and the case files of the below simulations here: GiD Case CaseFiles
I ran some combination of simulation parameters by changing the eigenvalue solver, B&S and OMP_NUM_THREADS. All cases are ran in full debug mode. The cases only ran when OMP_NUM_THREADS=1.
I also noticed that GiD15 is not able to visualize the results of the simulation. It gives a segmentation fault. Could this be due to the output process?
Eigenvalue Solver | Builder & Solver | OMP_NUM_THREADS | Error |
---|---|---|---|
Eigen | Block | 1 | - |
Eigen | Block | X | Seg Fault (No other hint) |
Eigen | Elimination | 1 | Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped) |
Eigen | Elimination | X | Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped) |
FEAST | Block | 1 | - |
FEAST | Block | X | Seg Fault (No other hint) |
FEAST | Elimination | 1 | Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped) |
FEAST | Elimination | X | Check failed in file /usr/include/boost/numeric/ublas/storage.hpp at line 195: i < size_ terminate called after throwing an instance of 'boost::numeric::ublas::bad_index' what(): bad index Aborted (core dumped) |
Unfortunately I am not able to reproduce the error I was getting with the big model. That error might be due to the insufficient RAM+Swap. I will try to run this simple case with a much finer mesh to see if I can trigger that too.
i find this VERY worrysome. where is the error thrown? (I mean, from what part of the code?)
I am suspecting that the Elimination B&S is not working because I have a node that does not belong to any elements, and I am applying the MasterSlaveConstraint in ModifyInitialGeometry which is called after AddDofs in analysis_stage
Is there another way to apply the master-slave constraints before the DOFs are added? Should the DOFs added manually at these nodes?
@RiccardoRossi I will take a look at in which function Segmentation Fault
is occuring.
@emiroglu does static analysis work in all these cases?
@qaumann could you help with the settings for the feast solver?
I will update the wiki page with the new solver parameters. I did not know about this page, so I did not update it when I changed FEAST. Sorry about this.
@emiroglu The FEAST wiki page now contains updated information about the parameters and how to set them to obtain a meaningful result.
@emiroglu does static analysis work in all these cases?
@oberbichler The uploaded model works with block b&s and both with super_lu
and pardiso_lu
and with OMP_NUM_THREADS=1 but not when elimination b&s is used or with multiple threads. The errors are the same as above.
i find this VERY worrysome. where is the error thrown? (I mean, from what part of the code?)
@RiccardoRossi
Segmentation Fault
is thrown after ResidualBasedBlockBuilderAndSolver: Build time: 3.2586
, so I am assuming it is somewhere inside the Solve function.
bad_index
error related to the Elimination B&S is thrown after ResidualBasedEliminationBuilderAndSolverWithConstraints: Constraint relation build time and multiplication: 2.91634
so here I am assuming it is after BuildWithConstraints
.
@emiroglu The FEAST wiki page now contains updated information about the parameters and how to set them to obtain a meaningful result.
thanks @qaumann
OK. Then my first guess would be that the error is actually not within the EigenSystemSolver. If it turns out that the problem occurs only with EigenSystemSolver or FEAST we can take a look at the code.
OK. Then my first guess would be that the error is actually not within the EigenSystemSolver. If it turns out that the problem occurs only with EigenSystemSolver or FEAST we can take a look at the code.
Still, the simulation setups that work in the above table with the provided simpler model do not work with the original model I am running. Then, with EigenSystemSolver I get a similar error as when the Elimination B&S is used and with FEAST I simply get a Killed
without any other hints.
I tried the example from above locally:
ApplyConstraints
function of the residualbased_block_builder_and_solver.h
, in the parallel loop. At least if i make the loop serial, it works for me with any number of threads.
EDIT: it seems that the first call of this function for the mass matrix is not problematic, but the second call for the stiffness matrix segfaults. No idea why.AFAIK the ElimBS has some concurrency issues for a while already. I am pretty sure they were never fixed
AFAIK the ElimBS has some concurrency issues for a while already. I am pretty sure they were never fixed
I also have something in mind. I think @adityaghantasala mentioned it does not work with MPCs?
I could reproduce the error related to the FEAST eigenvalue solver with a finer model. Simulation files can be found here. The link is available for 14 days.
I used FEAST, Block B&S and OMP_NUM_THREADS=1 in FullDebug mode.
The simulation exits with Killed
without any further hint. The terminal output can be found here.
Any opinions on this?
@emiroglu I tried to compute your example with 1.3M dofs, but my RAM is not large enough, so the process is killed, after memory and swap (40GB total) are completely full. So I suspect this is the reason for the process being killed in my case (I could also not solve the problem with Matlab with 32GB RAM). Is this the same behavior you are experiencing, or is your process being killed by something else?
I also noticed, that the input stiffness matrix is not symmetric, is this intended? (The non-symmetry most likely is a round-off error caused by the matrix market interface) If so, you need other settings for FEAST, e.g.
"e_mid_re" : 0.0,
"e_mid_im" : 0.0,
"e_r" : 1.0e4,
"echo_level": 1,
"max_iteration": 1000,
"number_of_eigenvalues": 2,
"search_highest_eigenvalues": false,
"search_lowest_eigenvalues": false,
"solver_type": "feast",
"sort_eigenvalues": false,
"sort_order": "sr",
"subspace_size": 4,
"symmetric": false,
"tolerance": 1e-6
@qaumann thank you for taking a look. I could never observe the exact memory usage at the time of termination since in FullDebug mode it takes some hours until that point, but I am also suspecting that.
I cannot really comment on the symmetry. It might as well be so due to master-slave constraints. Could @adityaghantasala maybe comment on the symmetry property of the stiffness matrix when master-slave constraints are used? Nevertheless, for the smaller model the computed eigenfrequencies with eigen_eigensystem
solver and feast
solver, assuming symmetric matrices, matched.
@emiroglu Sorry about the comment on symmetry, I corrected it above. It should be quite symmetric, the deivations are of order 1e-14, I did not check that immediately. I managed to solve the problem with Matlab, there it required ~50 GB memory. As I do not have a Linux machine with that much memory available at the moment, I cannot dig deeper, I am afraid. But I would assume the "killed"-problem stems from too little memory and is not directly related to FEAST.
The reason for the terminated simulation in combination with eigen_eigensystem
is the used underlying linear system solver for the subspace iterations. The regular compilation of Kratos defaults to sparse_lu
from Eigen library, that causes excessive memory usage. When Kratos is compiled with IntelMKL, the linear system solver defaults to the pardiso_ldlt
solver, which can handle such big cases. Here
This behavior is not controllable by the user and it is not obvious unless one takes a look into the code.
I think it would make sense to have an option for selecting the linear system solver in ProjectParameters.json
, so that the user is also aware of this possibility.
The other problems in this issue seem to be related to other parts of the code, so I will close this one and open others with more related titles.
Thank you all for your time and help!
This behavior is not controllable by the user and it is not obvious unless one takes a look into the code.
It is controllable since you can select if MKL should be used or not?
Use of a direct solver over e.g. an iterative one is suggested here, since multiple RHS are solved.
it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)
I think the idea is to add a json parameter for selecting the linear solver in the same way as e.g. for a static problem.
I agree with @armingeiser that choosing the fastest available direct solver make sense for almost all cases.
Anyway it might be a nice-to-have feature for special cases. But as long as nobody would benefit from such a feature I would not spend time on this.
it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)
Yes, there is the use_mkl_if_available
parameter but the linear system solver cannot be chosen freely by the user. It is either pardiso_ldlt
with mkl or sparse_lu
without. We could discuss this in another request issue.
it is controllable, we use it in the CI (as MKL gave slightly different results for a case that caused the CI to crash)
crash or assertion failed?
Maybe here another comment on the use of FEAST: @theshahzeb is currently carrying out unconstrained eigenfrequency studies (free-free) and has found that e_min = 0 does not find all six rigid body translations and rotations. Depending on the model, between two and five "numerically" zero modes are found. With a small negative number, nan is found for the missing rigid body degrees of freedom. As this behavior was (at least for us) unexpected, we thought we had an error with our model.
Another important thing to note since the problem is memory consumption for the direct solve:
MKL solvers have the possibility to run out of core (slower but part of the internal variables are written to disc) - This is not yet implemented in the LinearSolversApplication but could be added if needed.
https://eigen.tuxfamily.org/dox/classEigen_1_1PardisoLDLT.html
@e-dub AFAIK none of these algorithms is suitable for finding zero-eigenvectors. If I'm wrong I would be very interested in the correct setup. At the moment I am using SVD...
@oberbichler Thanks for the heads up on that. I would like to compare SVD with what @Theshahzeb found. Could you share the ProjectParameters for the SVD?
I am actually not using Kratos for that.
@oberbichler: @veiguf and I looked at i.a. free-free vibrations with Kratos-FEAST 2.5 years ago and verified the results with ANSYS, see here. Of course, quite a bit has changed in Kratos since then...
Do you have more information for us why FEAST is not suitable for zero eigenvalues? What software are you using for free-free vibraitons with SVD?
@e-dub As far as I know, you can compute all eigenvalues using the FEAST routines, if you have set its limits accordingly. With a small range towards both sides around zero, FEAST should find the eigenmodes corresponding to your rigid body movement. If your system matrices are symmetric, you can now (with FEAST 4.0) also find the n lowest eigenvalues in your specified range, so this could be what you are looking for.
The subspace iteration implemented in eigen_eigensystem
is working differently, I think, but I have not much experience with it.
@e-dub As far as I know, you can compute all eigenvalues using the FEAST routines, if you have set its limits accordingly. With a small range towards both sides around zero, FEAST should find the eigenmodes corresponding to your rigid body movement. If your system matrices are symmetric, you can now (with FEAST 4.0) also find the n lowest eigenvalues in your specified range, so this could be what you are looking for.
"search_lowest_eigenvalues" and "number_of_eigenvalues" did not work for us. Solver keeps running till it finds all the eigenvalues in the selected range. Selecting e_min slightly below zero gives all the rigid body eigenmodes indeed.
Thanks. I will take a look.
@qaumann EigenSystemSolver
needs to solve the system. I don't know if this step could be modified in order to work with singular matrices. Since in this case the range of the eigenvalues is known, FEAST should be fine.
"search_lowest_eigenvalues" and "number_of_eigenvalues" did not work for us. Solver keeps running till it finds all the eigenvalues in the selected range. Selecting e_min slightly below zero gives all the rigid body eigenmodes indeed.
@Theshahzeb That should not happen. Could you provide your FEAST settings and perhaps its output with echo_level=1
? Normally, it should find only number_of_eigenvalues
in the given interval.
@Theshahzeb That should not happen. Could you provide your FEAST settings and perhaps its output with
echo_level=1
? Normally, it should find onlynumber_of_eigenvalues
in the given interval.
Thank you looking into this, these are the feast settings I am using, trying to find lowest values:
"eigensolver_settings":{
"solver_type" : "feast",
"symmetric" : true,
"number_of_eigenvalues" : 16,
"e_min" : -0.1,
"e_max" : 1e09,
"search_lowest_eigenvalues" : true,
"search_highest_eigenvalues" : false,
"sort_eigenvalues" : true,
"sort_order" : "sr",
"subspace_size" : 100,
"max_iteration" : 1000,
"tolerance" : 1e-6,
"echo_level" : 1
},
This is the feast output
If i do
"search_lowest_eigenvalues" : false,
This is the output
@Theshahzeb Thank you for providing me with the output. There does not seem to be an error in the configuration, but FEAST's stochastic estimate fails. If you can share your model with me, I can take a more thorough look why it fails.
I have a structural model with about 3.6 million DOFs using volume, shell, truss and nodal elements. In addition, I am using master-slave constraints.
I have tried using both Eigen and FEAST solvers for solving for the eigenvalues, but it wasn't successful yet. The problems I am facing are usually related to a segmentation fault that stems from an index going out of range in the matrix class. I had a similar problem when I used "super_lu" and solved it by either using a computer with a bigger RAM or switching to "pardiso_lu".
Now, when I use the eigenvalue solver from the Eigen library the same problem occurs. Then, I tried compiling Kratos with Intel MKL and using FEAST for the eigenvalue solution. This time the program is terminated during the solution with a message "Killed" regardless of debug or release mode without further information on the error.
I have a couple of questions regarding the setup of FEAST solver:
I have also noticed very inconsistent behavior regarding the choice of the builder and solver. In the past, when Eigen library was used, some cases worked with elimination b&s and the others with block b&s.
Could somebody help me out to pin where the problem might be lying? Used solvers? Solver parameters? Possible bugs in the code?
I have tried quite some combinations but could not detect a distinct pattern yet. I will try to run a more systematic set of simulations and report the errors here.
Thanks!
@KratosMultiphysics/structural-mechanics