SIMEXP / psom

pipeline system for octave and matlab
http://psom.simexp-lab.org
Other
24 stars 13 forks source link

Reset to zero #119

Closed porban closed 7 years ago

porban commented 7 years ago

After a few hours of successful processing and thousands of jobs done, the pipeline is suddenly reset to 0, i.e. with no jobs done. It might correspond to the end of the wall time, but I'm not 100% sure about this.

porban commented 7 years ago

An example of logs can be found here on guillimin: '/gsf-624-aa/data/sz10site/mcicsirp/preproc_20170303/logs/'

poquirion commented 7 years ago

Might be linked to the the manager being resubmitted.

poquirion commented 7 years ago

Should investigate that and also implement #110

Having a different walltime for manager and worker would also be good.

pbellec commented 7 years ago

Likely cause is wall time. The recovery mechanism in psom2 is currently broken.

pbellec commented 7 years ago

redundant with #104