sacct running continuously

Snakemake-Profiles / slurm

Cookiecutter for snakemake slurm profile

MIT License

126 stars 44 forks source link

sacct running continuously #74

Closed rmvpaeme closed 2 years ago

rmvpaeme commented 3 years ago

I got notified by our HPC administrators that running sacct 10 times per second floods their logging system. Reducing the max-status-checks-per-second from 10 to 1 seems to be insufficient.

1) If I put the max-status-checks-per-second parameter at 0.01 (once every 100 seconds) and 200 jobs are running in parallel, does that mean that it takes 100 seconds * 200 = +/- 300 minutes to check the status for all job and that would dramatically slow the snakemake pipeline? 2) Is it possible to implement running sacct for all jobs at once every minute or so and filter out the relevant information?

frankier commented 3 years ago

It looks like this would be easiest with a modification to Snakemake itself. Snakemake just spawns to the cluster-config script every time it wants to know the status. If Snakemake were modified, some queuing of cluster-config requests could be made and they could be sent to cluster-config in a batch (plugins could be modified to support being sent multiple job ids). Without modifying Snakemake we would need to persist information between runs and we would have to have some way to "wake-up" later -- both difficult to support in a fast/robust way across different HPC systems.

percyfal commented 2 years ago

See issue #81 which hopefully will provide an improved way of doing polling. The upcoming changes will rely on changes upstream, i.e. in snakemake itself as has been noted above. I'm keeping this issue open until we hopefully have a solution upcoming soon.

holtgrewe commented 2 years ago

With the new caching sidecar, this can be closed.