Closed atemerev closed 5 months ago
Logfiles from GitLab pipeline #201647 (:no_entry:) have been uploaded here!
Status and direct links:
Hi @atemerev , there is the exception type ConfigurationError
which should be raised for errors during reading config files. This exception should be raised by all ranks, and to be caught properly. The errors during modelling, such as read and creation of cells and synapses are more complex. They may happen in some of the ranks but not all. An example what I can think of is loading the emodel hoc template in Cell_V6._instantiate_cell where the EModel
files are from scientists and some of them may contain errors. As difference cells require different EModel templates, the EModel
files load in each rank are not the same meaning that we may have errors in some of the ranks. In this case, we may not be able to log it at rank 0.
After the final decision addressed on BBPBGLIB-1139, we can close this PR.
Context
On simulation launch in commands.py, exceptions were handled in the same way for configuration / model loading errors, and simulation errors. This required quirky synchronization to make sure that exceptions were logged only at a single MPI node, otherwise they flood the output.
The idea of this PR is to separate exception handling for configuration parsing / model loading (these errors are the same for all nodes, and supposed to be logged only at the master node), and simulation errors (can happen only at some runs, and can be different everywhere, and perhaps need to be logged at all nodes).
Scope
Separate exception handling at model loading stage and simulation run stage. Call _mpi_abort only in the latter case. In the former case, log errors only on the node with the MPI rank 0.
Testing
Again, I don't think it is feasible to write a unit test for this.
Review