flow123d / flow123d

Main repository of the Flow123d project.
http://flow123d.github.io/
19 stars 14 forks source link

Terminate all MPI processes on error #1048

Closed jbrezmorf closed 5 years ago

jbrezmorf commented 5 years ago
dflanderka commented 5 years ago

Example of Error output:


Found 1 yaml file/s
Running in LOCAL mode
Running 2 cases (excluding all tags in set ['disabled'])
------------------------------------------------------------
Case 01 of 02
Running: 1 x 02_generic_input/03_field_descriptors
bin/flow123d -s tests/02_generic_input/03_field_descriptors.yaml -o tests/02_generic_input/test_results/03_field_descriptors.1
Done    | elapsed time 0:00:01:562, memory used  142.87MB
[ERROR] |   Command ended with 1! (pid=610)
            Full command:
            /flow123d/bin/flow123d -s /flow123d/tests/02_generic_input/03_field_descriptors.yaml -o /flow123d/tests/02_generic_input/test_results/03_field_descriptors.1
    ------------------------------------------------------------
    Output from file /flow123d/tests/02_generic_input/test_results/03_field_descriptors.1/job_output.log
    ############################################################
    ##
    ##                   [EI_MPI_Rank_TAG*] = -1
    ##
    ##                   ** Stacktrace **
    ##                     0  SchurComplement::create_inversion_matrix()
    ##                     1  SchurComplement::form_schur()
    ##                     2  SchurComplement::resolve()
    ##                     3  SchurComplement::compute_residual()
    ##                     4  DarcyMH::solve_nonlinear()
    ##                     5  DarcyMH::zero_time_step()
    ##                     6  HC_ExplicitSequential::run_simulation()
    ##                     7  Application::run()
    ##                     8  ApplicationBase::init(int, char**)
    ##                     9  main
    ##
    ##                   --------------------------------------------------------
    ## application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
    ## [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
    ## :
    ## system msg for write_line failure : Bad file descriptor
    ############################################################
------------------------------------------------------------
Case 02 of 02
Running: 2 x 02_generic_input/03_field_descriptors
bin/mpiexec -np 2 bin/flow123d -s tests/02_generic_input/03_field_descriptors.yaml -o tests/02_generic_input/test_results/03_field_descriptors.2
Done    | elapsed time 0:00:01:566, memory used  356.16MB
[ERROR] |   Command ended with 1! (pid=628)
            Full command:
            /flow123d/bin/mpiexec -np 2 /flow123d/bin/flow123d -s /flow123d/tests/02_generic_input/03_field_descriptors.yaml -o /flow123d/tests/02_generic_input/test_results/03_field_descriptors.2
    ------------------------------------------------------------
    Output from file /flow123d/tests/02_generic_input/test_results/03_field_descriptors.2/job_output.log
    ############################################################
    ##                   Dynamic exception type: boost::exception_detail::clone_impl<ExcChkErr>
    ##                   [EI_ErrCode_TAG*] = 63
    ##
    ##                   ** Stacktrace **
    ##                     0  chkerr(unsigned int)
    ##                     1  chkerr_assert(unsigned int)
    ##                     2  Balance::add_flux_matrix_values(unsigned int, unsigned int, std::vector<int, std::allocator<int> > const&, std::vector<double, std::allocator<double> > const&)
    ##                     3  AssemblyMH<2>::add_fluxes_in_balance_matrix(LocalElementAccessorBase<3>)
    ##                     4  AssemblyMH<2>::assemble(LocalElementAccessorBase<3>)
    ##                     5  DarcyMH::assembly_mh_matrix(std::vector<std::shared_ptr<AssemblyBase>, std::allocator<std::shared_ptr<AssemblyBase> > >&)
    ##                     6  DarcyMH::assembly_linear_system()
    ##                     7  DarcyMH::solve_nonlinear()
    ##                     8  DarcyMH::zero_time_step()
    ##                     9  HC_ExplicitSequential::run_simulation()
    ##                    10  Application::run()
    ##                    11  ApplicationBase::init(int, char**)
    ##                    12  main
    ##
    ##                   --------------------------------------------------------
    ## application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
    ############################################################
------------------------------------------------------------
Summary:
    [ FAILED ]  | 1 x 02_generic_input/03_field_descriptors     [ 1.69 sec] | error while execution
    [ FAILED ]  | 2 x 02_generic_input/03_field_descriptors     [ 1.70 sec] | error while execution
    ------------------------------------------------------------
    [ FAILED ]  | passed=0, failed=2, skipped=0 in [ 3.39 sec]
------------------------------------------------------------```
dflanderka commented 5 years ago

Merged in 95df358