adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
183 stars 124 forks source link

system parameter ntask_max is not honored for certain subclasses #200

Open bch0w opened 8 months ago

bch0w commented 8 months ago

Certain System sub classes that do not support array jobs (e.g., Frontera, Wisteria). The work around implementation is to submit individual jobs to the system one by one. However, these modules have no mechanism for controlling the parameter ntask_max and so will submit all jobs simultaneously to the job scheduler.

This is not the intended behavior and may lead to resource competition or upset sysadmins. These systems need their own internal ntask_max routine which only submits ntask_max jobs at once, and monitors the queue, submitting new jobs when previous jobs complete.

I think all the requisite pieces are there, just requires implementation and testing. I think what will be the biggest hurdle is the live checking of a job queue and the decision to submit new jobs, this can sometimes be a finicky operation.

bch0w commented 6 days ago

This is implemented for Wisteria using #227, still open for Frontera although we might be able to use the same mechanism developed for Wisteria