BlueBrain / neurodamus

A BBP Simulation Control application for NEURON
https://neurodamus.readthedocs.io
Apache License 2.0
9 stars 8 forks source link

[BBPBGLIB-1139] config / model error logging fix, stage 2 #146

Closed atemerev closed 5 months ago

atemerev commented 7 months ago

Context

On simulation launch in commands.py, exceptions were handled in the same way for configuration / model loading errors, and simulation errors. This required quirky synchronization to make sure that exceptions were logged only at a single MPI node, otherwise they flood the output.

The idea of this PR is to separate exception handling for configuration parsing / model loading (these errors are the same for all nodes, and supposed to be logged only at the master node), and simulation errors (can happen only at some runs, and can be different everywhere, and perhaps need to be logged at all nodes).

Scope

Separate exception handling at model loading stage and simulation run stage. Call _mpi_abort only in the latter case. In the former case, log errors only on the node with the MPI rank 0.

Testing

Again, I don't think it is feasible to write a unit test for this.

Review

bbpbuildbot commented 7 months ago

Logfiles from GitLab pipeline #201647 (:no_entry:) have been uploaded here!

Status and direct links:

WeinaJi commented 7 months ago

Hi @atemerev , there is the exception type ConfigurationError which should be raised for errors during reading config files. This exception should be raised by all ranks, and to be caught properly. The errors during modelling, such as read and creation of cells and synapses are more complex. They may happen in some of the ranks but not all. An example what I can think of is loading the emodel hoc template in Cell_V6._instantiate_cell where the EModel files are from scientists and some of them may contain errors. As difference cells require different EModel templates, the EModel files load in each rank are not the same meaning that we may have errors in some of the ranks. In this case, we may not be able to log it at rank 0.

WeinaJi commented 5 months ago

After the final decision addressed on BBPBGLIB-1139, we can close this PR.