LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

Allocation Check #459

Closed gonsie closed 5 months ago

gonsie commented 3 years ago

Related to #320

This changes the behavior of make check to add the option --stop-on-failure. It also adds a 'check allocation' test/script before any of the parallel tests are run. The idea is that this script will check that users have a proper allocation before running the tests and prevent the very slow running of multiple 'srun' commands outside of an allocation. It only prevents users from running if they use make check and will not stop all the parallel tests from running if the users use make test... but at least it will print out an error message the users can stare at while their srun is queueing.

Right now only the fact that the script is in allocation is checked (that is, that a job id environment variable is set).

Things to add:

gonsie commented 6 months ago

@gonsie review this.

gonsie commented 5 months ago

This PR is so outdated that I'm closing it rather than figure out how to bring it up to date. We may want to revisit this some day... but not today.