KratosMultiphysics / Kratos

Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface.
https://kratosmultiphysics.github.io/Kratos/
Other
1.03k stars 245 forks source link

Running multiple cases using One MainKratos file (Doesn't work for not converged solutions) #10090

Open UmarSaeedMalik opened 2 years ago

UmarSaeedMalik commented 2 years ago

Hello,

I am trying to run multiple simulations using 1 code (trying to do like a parametric study). For example the code below has a variation of the friction coefficient in each case.

I have changed my main file and it looks like this: Capture

However, the loop just works when the preceding case finishes properly. If the solution doesn't converge for the preceding case, the code breaks and can't resume for the next case.

Even if my simulation stops for a case, i want the simulation to keep going on for the other cases.

Can somebody help me with this please? After a lot of investigation I was able to figure out that my code is stopping at the function void BuildRHS( typename TSchemeType::Pointer pScheme, ModelPart& rModelPart, TSystemVectorType& b) override in the ResidualBasedBlockBuilderAndSolver class.

loumalouomega commented 2 years ago

You can try with try and catch

loumalouomega commented 2 years ago

https://www.w3schools.com/python/python_try_except.asp

UmarSaeedMalik commented 2 years ago

https://www.w3schools.com/python/python_try_except.asp

Thank you for your suggestion, but it doesn't work.

The code is breaking at "is_converged = self._GetSolver().SolveSolutionStep()" in the analysis_stage. Everything I try to do after this step is not executed.

sunethwarna commented 2 years ago

May be you can try converting the above python script into a combination of bash/python script such that every simulation is executed from the bash. Or may be you can use python script to invoke new subprocesses for each simulation. (https://www.adamsmith.haus/python/answers/how-to-run-bash-commands-in-python).

matekelemen commented 2 years ago

If the solution doesn't converge for the preceding case, the code breaks and can't resume for the next case.

By breaking, do you mean it 1) raises an Exception (from python) 2) throws an Exception (from C++) 3) segfaults 4) hangs/enters an infinite loop

?

If it's 1 or 2, you should be able to handle it with loumalouomega's try-except suggestion; something like this:

try:
    simulation.Run()
except Exception as exception:
    pass # or handle it as you wish

In case of 3, sunethwarna's suggestion should help, but please open an issue in which you describe where the segfault is happening. (look into os.system; usually subprocess is preferable but that can raise exceptions and you don't want that in this case)

Suneth's answer should also help you with case 4, but make sure the simulations are called in a non-blocking manner (i.e.: python doesn't wait for the process to stop executing). That said, this is definitely the worst case and you should find out what's going wrong (and open an issue with more information if Kratos is to blame).

UmarSaeedMalik commented 2 years ago

May be you can try converting the above python script into a combination of bash/python script such that every simulation is executed from the bash. Or may be you can use python script to invoke new subprocesses for each simulation. (https://www.adamsmith.haus/python/answers/how-to-run-bash-commands-in-python).

Thank you for your reply. I am working with Windows. But with your idea i am trying if I can do the same using a .bat script.

UmarSaeedMalik commented 2 years ago

If the solution doesn't converge for the preceding case, the code breaks and can't resume for the next case.

By breaking, do you mean it

  1. raises an Exception (from python)
  2. throws an Exception (from C++)
  3. segfaults
  4. hangs/enters an infinite loop

?

If it's 1 or 2, you should be able to handle it with loumalouomega's try-except suggestion; something like this:

try:
    simulation.Run()
except Exception as exception:
    pass # or handle it as you wish

In case of 3, sunethwarna's suggestion should help, but please open an issue in which you describe where the segfault is happening. (look into os.system; usually subprocess is preferable but that can raise exceptions and you don't want that in this case)

Suneth's answer should also help you with case 4, but make sure the simulations are called in a non-blocking manner (i.e.: python doesn't wait for the process to stop executing). That said, this is definitely the worst case and you should find out what's going wrong (and open an issue with more information if Kratos is to blame).

@matekelemen Thank you for your reply. I am working on contact so its very common to not finding a solution, however my simulations ends suddenly at any nonlinear iteration, e.g as shown below: Capture Here you can see that after the 2nd iteration it just stops, there's no hanging or an error/exception, it just stops. I tried to search for the cause and as I mentioned before, I could zoom/ reach until a function that is void BuildRHS( typename TSchemeType::Pointer pScheme, ModelPart& rModelPart, TSystemVectorType& b) override in the ResidualBasedBlockBuilderAndSolver class.

I am not really sure why is it happening but the script does not complete. It breaks at "is_converged = self._GetSolver().SolveSolutionStep()" in the analysis_stage without any warning or error.

matekelemen commented 2 years ago

Ah ok so there are no errors but the simulation stops advancing after a step fails to converge. Can you please copy your ProjectParameters.json here? (you can put it in 3 backticks ``` so whitespace is preserved and it's more readable - more on formatting in markdown)

I might not be of much help because I haven't done contact mechanics yet, but I imagine the only good solution here would be to try increasing the max number of nonlinear iterations, decreasing time steps, using adaptive time steps, using a different solver, or using a better predictor. I don't think continuing the solution loop with unconverged steps would make any sense.

mpentek commented 2 years ago

Definitely go for a combination of bash script with the MainKratos. We do it on Linux clusters for large jobs without an issue. And we do have jobs that fail. Basically you intend to do job farming which is typically (in my experience) done with a smart usage of bash scripts.

You should be able to define if the jobs are executed parallel or serial and whether they wait for each other to finish. You could also put a time cap on the process to force kill a possible infinite loop or hanging.