Closed rmvpaeme closed 2 years ago
It looks like this would be easiest with a modification to Snakemake itself. Snakemake just spawns to the cluster-config script every time it wants to know the status. If Snakemake were modified, some queuing of cluster-config requests could be made and they could be sent to cluster-config in a batch (plugins could be modified to support being sent multiple job ids). Without modifying Snakemake we would need to persist information between runs and we would have to have some way to "wake-up" later -- both difficult to support in a fast/robust way across different HPC systems.
See issue #81 which hopefully will provide an improved way of doing polling. The upcoming changes will rely on changes upstream, i.e. in snakemake itself as has been noted above. I'm keeping this issue open until we hopefully have a solution upcoming soon.
With the new caching sidecar, this can be closed.
I got notified by our HPC administrators that running sacct 10 times per second floods their logging system. Reducing the
max-status-checks-per-second
from 10 to 1 seems to be insufficient.1) If I put the
max-status-checks-per-second
parameter at 0.01 (once every 100 seconds) and 200 jobs are running in parallel, does that mean that it takes 100 seconds * 200 = +/- 300 minutes to check the status for all job and that would dramatically slow the snakemake pipeline? 2) Is it possible to implement running sacct for all jobs at once every minute or so and filter out the relevant information?