dbbs-lab / bsb-core

The Brain Scaffold Builder
https://bsb.readthedocs.io
GNU General Public License v3.0
21 stars 16 forks source link

Resolve weird duplication of jobs #851

Closed filimarc closed 2 months ago

filimarc commented 2 months ago

Describe the work done

We encounter a stochastic duplication of job that are enqueued in the MPI pool. It seems that since the scheduler is executed in a separate thread respect to the main pool execution, it happens that, if the thread is not quick enough to fill the lists of job to be enqueue by the main process, the same job can be enqueue in synchrony by both the processes. This problem is more probably to appear if MPI rank is low because it is faster to set up, so it should be very rare with an high numbers of ranks. To prevent the duplication, at list in submission, we propose to add more strict rule for Jobs enqueuing imposing that only job with status PENDING can be enqueued, so forbidding to enqueue a job that is already submitted.

List which issues this resolves:

closes #820

Tasks