geodynamics / aspect

A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
https://aspect.geodynamics.org/
Other
227 stars 237 forks source link

Error in visualization postprocessor #2991

Closed ricitron closed 5 years ago

ricitron commented 5 years ago

The most recent version of ASPECT crashes for me during the visualization processor:

-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 2.2.0-pre (master, 2bb4a63b6)
--     . using deal.II 9.0.1
--     .       with 32 bit indices and vectorization level 1 (128 bits)
--     . using Trilinos 12.10.1
--     . using p4est 2.0.0
--     . running in OPTIMIZED mode
--     . running with 32 MPI processes
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.2.0-pre&sha=2bb4a63b6&src=code
-----------------------------------------------------------------------------
Number of active cells: 196,608 (on 9 levels)
Number of degrees of freedom: 4,924,421 (1,576,962+197,633+788,481+788,481+196,608+589,824+786,432)

*** Timestep 0:  t=0 years
   Solving temperature system... 0 iterations.
   Solving volume of fluid system... 0 iterations.
   Solving volume of fluid system... 0 iterations.
   Rebuilding Stokes preconditioner...
   Solving Stokes system... 72+0 iterations.

   Postprocessing:

...

----------------------------------------------------
Exception on MPI process <14> while running postprocessor <N6aspect11Postprocess13VisualizationILi2EEE>:

--------------------------------------------------------
An error occurred in line <6632> of file </tmp/unpack/deal.II-v9.0.1/source/base/data_out_base.cc> in function
    void dealii::DataOutInterface<dim, spacedim>::write_vtu_in_parallel(const char*, MPI_Comm) const [with int dim = 2; int spacedim = 2; MPI_Comm = ompi_communicator_t*]
The violated condition was:
    ierr == MPI_SUCCESS
Additional information:
deal.II encountered an error while calling an MPI function.
The description of the error provided by MPI is "MPI_ERR_OTHER: known error not in list".
The numerical value of the original error code is 16.
--------------------------------------------------------

Aborting!

...

[c8-81:17061] PMIX ERROR: NOT-FOUND in file server/pmix_server_ops.c at line 2166

The code compiled without errors. When I remove the visualization postprocessor then aspect runs fine for multiple timesteps. It is only with the visualization postprocessor that it crashes.

Any ideas?

bangerth commented 5 years ago

Out of curiosity, what happens if you run in debug mode instead of release mode?

ricitron commented 5 years ago

I get the same error with no additional output.

gassmoeller commented 5 years ago

Hi Robert, We need a bit more information to figure out what is going on. Does the problem also happen for smaller models? Do you group output in less files than processors or does each processor write its own output? Which visualization plugins did you activate? All of our test output is fine, so either you discovered a bug with your particular model setup (can you upload a simple reproducible parameter file that causes the error?), or it is something about the combination of ASPECT and deal.II version. We recently did internal changes to the visualization postprocessor (see #2925), maybe that caused something.

ricitron commented 5 years ago

Here is the input file:

input.txt

Maybe it is something about the combination of ASPECT and deal.II version. I have tried compiling and running aspect-2.2.0-pre on two clusters now, both without success. In both cases ASPECT successfully compiled but crashed when running. On one cluster it occurred before any timesteps were complete. On the second cluster it is the error I posted above; aspect runs but crashes on the visualization postprocessor. On both clusters I had previously compiled working versions of aspect-2.1. I've been trying to do a clean install of deal.II to see if the most recent version of deal.II fixes the issue, but I keep running into errors compiling the candi deal.II package.

ricitron commented 5 years ago

On the other cluster where it crashes before starting a single timestep, I get the following error output:

slurm-4365174.txt

gassmoeller commented 5 years ago

Not sure what is going on, but the model you sent runs on my system (logfile attached). I have slightly different ASPECT and deal.II versions though. Can you try two variations of your model:

  1. Try adding the following to your Visualization subsection: set Number of grouped files = 0. This switches off MPI-IO which we had trouble with before.
  2. Does the model run without the visualization postprocessor?
gassmoeller commented 5 years ago

log.txt

ricitron commented 5 years ago

The model does run without the visualization postprocessor.

With your fix of set Number of grouped files = 0, it did run when the output format was set to vtu, but not when it was set to hdf5. When it was set to hdf5 I received a bunch of errors but it is probably something messed up with the aspect build on my system.

ricitron commented 5 years ago

I was able to reinstall deal.II and aspect and the issue was resolved, so it appears to have been an issue with a deal.ii and aspect mismatch. Thanks Rene.

One thing I noticed when recompiling the most recent version of aspect was that I got the following error:

[ 98%] Building CXX object CMakeFiles/aspect.dir/unit_tests/parse_map_to_double_array.cc.o
[ 98%] Building CXX object CMakeFiles/aspect.dir/unit_tests/termination_criteria.cc.o
[100%] Building CXX object CMakeFiles/aspect.dir/unit_tests/utilities.cc.o
[100%] Linking CXX executable aspect
CMakeFiles/aspect.dir/source/material_model/viscoelastic_plastic.cc.o:viscoelastic_plastic.cc:function aspect::MaterialModel::ViscoelasticPlastic<2>::create_additional_named_outputs(aspect::MaterialModel::MaterialModelOutputs<2>&) const: error: undefined reference to 'aspect::MaterialModel::ElasticAdditionalOutputs<2>::ElasticAdditionalOutputs(unsigned int)'
CMakeFiles/aspect.dir/source/material_model/viscoelastic_plastic.cc.o:viscoelastic_plastic.cc:function aspect::MaterialModel::ViscoelasticPlastic<3>::create_additional_named_outputs(aspect::MaterialModel::MaterialModelOutputs<3>&) const: error: undefined reference to 'aspect::MaterialModel::ElasticAdditionalOutputs<3>::ElasticAdditionalOutputs(unsigned int)'
collect2: error: ld returned 1 exit status
CMakeFiles/aspect.dir/build.make:7187: recipe for target 'aspect' failed
make[2]: *** [aspect] Error 1
CMakeFiles/Makefile2:356: recipe for target 'CMakeFiles/aspect.dir/all' failed
make[1]: *** [CMakeFiles/aspect.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

This error appeared when compiling the release version (the debug version compiled without error). It appears related to the recently merged ViscoelasticPlastic model. When I revert to a commit prior to the inclusion of this model aspect compiles the release version without error.

bangerth commented 5 years ago

Please update to the newest github version -- I think we merged the patch for this a few minutes ago.

ricitron commented 5 years ago

Thank you! Yes everything is working now.

gassmoeller commented 5 years ago

Great, thanks for letting us know about the issues, that is always helpful.